5

I have 575 bz2 files with average size 3G and need to convert them to .gz format to make them compatible with a downstream pipeline.

$ ll -h | head
total 1.4T
drwxrws---+ 1 dz33 dcistat  24K Aug 23 09:21 ./
drwxrws---+ 1 dz33 dcistat  446 Aug 22 11:57 ../
-rw-rw----  1 dz33 dcistat 2.0G Aug 22 11:38 DRR091550_1.fastq.bz2
-rw-rw----  1 dz33 dcistat 2.0G Aug 22 11:38 DRR091550_2.fastq.bz2
-rw-rw----  1 dz33 dcistat 2.0G Aug 22 11:38 DRR091551_1.fastq.bz2
-rw-rw----  1 dz33 dcistat 2.0G Aug 22 11:38 DRR091551_2.fastq.bz2
-rw-rw----  1 dz33 dcistat 1.9G Aug 22 11:38 DRR091552_1.fastq.bz2
-rw-rw----  1 dz33 dcistat 1.9G Aug 22 11:38 DRR091552_2.fastq.bz2
-rw-rw----  1 dz33 dcistat 1.8G Aug 22 11:38 DRR091553_1.fastq.bz2

$ ll | wc -l
575

For a single file I probably can do bzcat a.bz2 | gzip -c >a.gz, but I am wondering how to convert them entirely with one command or loop in bash/linux.

David Z
  • 6,641
  • 11
  • 50
  • 101
  • This might help https://stackoverflow.com/questions/14505047/loop-through-all-the-files-with-a-specific-extension – marcusshep Aug 25 '17 at 15:51

2 Answers2

6

Do them simply and fast in parallel with GNU Parallel:

parallel --dry-run 'bzcat {} | gzip -c > {.}.gz' ::: *bz2

Sample Output

bzcat a.bz2 | gzip -c > a.gz
bzcat b.bz2 | gzip -c > b.gz
bzcat c.bz2 | gzip -c > c.gz

If you like how it looks, remove the --dry-run. Maybe add a progress meter with --bar or --progress.

Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • 1
    This is very helpful! Thanks! – David Z Aug 25 '17 at 17:26
  • No need to thank me, Stack Overflow's @OleTange is the wizard behind **GNU Parallel** - ensuring everyone gets good value from all those CPU cores that they paid Intel so handsomely for! Good luck with your project and feel free to come back to SO if you have any further questions - answers are free :-) – Mark Setchell Aug 25 '17 at 18:13
2

In a terminal, change directory to the one containing the .bz files, then use the following command:

for f in *.bz; do bzcat "$f" | gzip -c >"${f%.*}.gz"; done

This will process each file, one at a time, and give the .gz file the name of the .bz file.

Example: DRR091550_1.fastq.bz2 will become DRR091550_1.fastq.gz.

user3439894
  • 7,266
  • 3
  • 17
  • 28
  • Thanks! I later found `for f in *.bz; do bzcat "$f" | gzip -c >"${f%.*}.gz" &; done` will perform all together :) – David Z Aug 25 '17 at 18:20
  • @David Z, Yes it will do all the targeted files in the dir however, it processes them one at a time until done. Mark Setchell's answer using GNU Parallel is better if you have a multi-core processor as it does them in parallel. – user3439894 Aug 25 '17 at 18:27