6

Currently I am in this directory-

/data/real/test

When I do ls -lt at the command prompt. I get like below something-

REALTIME_235000.dat.gz
REALTIME_234800.dat.gz
REALTIME_234600.dat.gz
REALTIME_234400.dat.gz
REALTIME_234200.dat.gz

How can I consolidate the above five dat.gz files into one dat.gz file in Unix without any data loss. I am new to Unix and I am not sure on this. Can anyone help me on this?

Update:-

I am not sure which is the best way whether I should unzip each of the five file then combine into one? Or combine all those five dat.gz into one dat.gz?

AKIWEB
  • 19,008
  • 67
  • 180
  • 294

2 Answers2

12

If it's OK to concatenate files content in random order, then following command will do the trick:

zcat REALTIME*.dat.gz | gzip > out.dat.gz

Update

This should solve order problem:

zcat $(ls -t REALTIME*.dat.gz) | gzip > out.dat.gz
Ivan Nevostruev
  • 28,143
  • 8
  • 66
  • 82
  • `zcat *.gz | gzip > out.dat.gz` I tried doing like this. And I got this error `REALTIME_EXPORT_v1x0_20120801_9_T_234000_234200.dat.gz.Z: No such file or directory ` for all the five files. Why is it so? – AKIWEB Aug 02 '12 at 21:02
  • @Nevzz03 I can't reproduce this problem. I'm using **bash** on linux and same file names. – Ivan Nevostruev Aug 02 '12 at 21:11
  • 4
    @Nevzz03 Are you on Solaris instead of Linux? If so, use `gzcat *.gz | gzip > out.dat.gz` instead. The `zcat` utility on Solaris works with a different compression suite (`compress` and `decompress`) that uses `.Z` as a suffix instead of `.gz`. This might also be the case on other non-Linux Unixen (AIX, etc.)... – twalberg Aug 02 '12 at 21:15
  • If you see my above comment, There is extra `Z` that gets appended after the file name. `dat.gz.Z` Why is it so? – AKIWEB Aug 02 '12 at 21:16
  • Please see answer by Mark Adler. ~1000 times faster and more correct. – Morlock Nov 01 '13 at 20:34
  • @Morlock Mark is using `cat`, so it'll not work with compressed files that OP is asking for. – Ivan Nevostruev Nov 01 '13 at 22:50
  • @IvanNevostruev Yes it will, that is the beauty of the gzip format. If you cat a.txt and b.txt THEN gzip or gzip them both then cat, you get two archives with the exact same content. To verify, unzip the two archives and use md5sum. (I just re-tried it to confirm). That is why Mark Adler pointed to the fact that it is unnecessary to decompress then recompress them. – Morlock Nov 07 '13 at 18:43
5

What do you want to happen when you gunzip the result? If you want the five files to reappear, then you need to use something other than the gzip (.gz) format. You would need to either use tar (.tar.gz) or zip (.zip).

If you want the result of the gunzip to be the concatenation of the gunzip of the original files, then you can simply cat (not zcat or gzcat) the files together. gunzip will then decompress them to a single file.

cat [files in whatever order you like] > combined.gz

Then:

gunzip combined.gz

will produce an output that is the concatenation of the gunzip of the original files.

The suggestion to decompress them all and then recompress them as one stream is completely unnecessary.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158