How to Convert Compressed Text: bzip2 to gzip without Temporary Files Multithreaded

Posted on Updated on

The tools that you will need is: lbzip2, pigz and split.

lbzip2 => http://lbzip2.org/
pigz => http://www.zlib.net/pigz/

If you are using Ubuntu (I’m using 14.04 LTS).
You can easily install lbzip2 and pigz by using apt-get (aptitude).

$ apt-get install lbzip2 pigz

Or you could download the source code from their website and compile manually.

Let say you have 500gb of text compressed bzip2 files called file01.txt.bz2 and you want to split that file to a multiple gziped files with 1500000 lines each, so it will be able to be processed faster in your hadoop cluster.

$ lbzcat file01.txt.bz2 | split -d -a 10 -l1500000 --filter='pigz > newfile01-$FILE.gz'

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s