114

I have list of gzip files:

file1.gz
file2.gz
file3.gz

Is there a way to concatenate or gzipping these files into one gzip file without having to decompress them?

In practice we will use this in a web database (CGI). Where the web will receive a query from user and list out all the files based on the query and present them in a batch file back to the user.

bdonlan
  • 224,562
  • 31
  • 268
  • 324
neversaint
  • 60,904
  • 137
  • 310
  • 477

4 Answers4

142

With gzip files, you can simply concatenate the files together, like so:

cat file1.gz file2.gz file3.gz > allfiles.gz

Per the gzip RFC,

A gzip file consists of a series of "members" (compressed data sets). [...] The members simply appear one after another in the file, with no additional information before, between, or after them.

Note that this is not exactly the same as building a single gzip file of the concatenated data; among other things, all of the original filenames are preserved. However, gunzip seems to handle it as equivalent to a concatenation.

Since existing tools generally ignore the filename headers for the additional members, it's not easily possible to extract individual files from the result. If you want this to be possible, build a ZIP file instead. ZIP and GZIP both use the DEFLATE algorithm for the actual compression (ZIP supports some other compression algorithms as well as an option - method 8 is the one that corresponds to GZIP's compression); the difference is in the metadata format. Since the metadata is uncompressed, it's simple enough to strip off the gzip headers and tack on ZIP file headers and a central directory record instead. Refer to the gzip format specification and the ZIP format specification.

Ken Williams
  • 22,756
  • 10
  • 85
  • 147
bdonlan
  • 224,562
  • 31
  • 268
  • 324
  • 1
    I tried `gzip file1.gz file2.gz file3.gz > allfiles.gz`. It failed. Was that what you meant? – neversaint Nov 04 '11 at 05:22
  • 46
    Nope. Just `cat file1.gz file2.gz file3.gz > allfiles.gz`. It really is that simple :) – bdonlan Nov 04 '11 at 05:23
  • thanks. Is there a way I can preserve the `f1.gz f2.gz f3.gz` in their existing gzip format? So when I uncompressed back `allfiles.gz` it will gimme 3 files back. – neversaint Nov 04 '11 at 05:27
  • 2
    technically speaking, they are preserved. It's just that existing tools generally don't have the capability to extract them separately. You might want to look into building a ZIP header and directory - the ZIP format uses the same underlying compression algorithm, so it's just a matter of changing out the (uncompressed) metadata. Take a look at http://www.gzip.org/zlib/rfc-gzip.html (the source format) and http://www.pkware.com/documents/casestudies/APPNOTE.TXT . – bdonlan Nov 04 '11 at 05:30
  • 21
    Better than building a zip of gz files, just tar them. It's the same as the `cat` answer but with some extra metadata. You can later untar them to get the original file names, then unpack all or just a few as needed. – sorpigal Nov 04 '11 at 12:32
  • 2
    many comments here are about `.zip` files. The standard way of putting multiple files together into one compressed archive using the algorithm gzip (or bzip2) is using tar: `tar` puts files together (uncompressed) and preserves file names and attributes, gzip's job is to compress the result. this can even be done in one step using the `-z` option of `tar`. the resulting file extensions are `.tar.gz` or `.tgz`. In case you want to put already compressed .gz files together, just use tar. it doesn't do any further compression which makes sense for already compressed files. – Daniel Alder Mar 26 '14 at 10:22
  • Shouldn't it be `zcat file1.gz file2.gz file3.gz > allfiles.gz`? Is that different from using `cat`? – alvas Aug 03 '15 at 13:49
  • 3
    @alvas, `zcat` decompresses its input, so that'd give you a decompressed output with a `.gz` extension. – bdonlan Aug 06 '15 at 01:33
  • 4
    Apparently there are some tools that will mistakenly stop when they reach the end of the first gzip'ed member. https://github.com/pysam-developers/pysam/issues/738#issuecomment-487958180 – Jeremy Leipzig Oct 25 '19 at 08:06
  • What if I have hundreds of `gz`file? Is `cat *.gz > allfiles.gz` enough? Do we need to pay attention to the order? – Tengerye Mar 18 '21 at 13:35
56

Here is what man 1 gzip says about your requirement.

Multiple compressed files can be concatenated. In this case, gunzip will extract all members at once. For example:

gzip -c file1  > foo.gz
gzip -c file2 >> foo.gz

Then

gunzip -c foo

is equivalent to

cat file1 file2

Needless to say, file1 can be replaced by file1.gz.

You must notice this:

gunzip will extract all members at once

So to get all members individually, you will have to use something additional or write, if you wish to do so.

However, this is also addressed in man page.

If you wish to create a single archive file with multiple members so that members can later be extracted independently, use an archiver such as tar or zip. GNU tar supports the -z option to invoke gzip transparently. gzip is designed as a complement to tar, not as a replacement.

hellow
  • 12,430
  • 7
  • 56
  • 79
Nehal Dattani
  • 933
  • 6
  • 15
21

Just use cat. It is very fast (0.2 seconds for 500 MB for me)

cat *gz > final
mv final final.gz

You can then read the output with zcat to make sure it's pretty:

zcat final.gz

I tried the other answer of 'gz -c' but I ended up with garbage when using already gzipped files as input (I guess it double compressed them).

PV:

Better yet, if you have it, 'pv' instead of cat:

pv *gz > final
mv final final.gz

This gives you a progress bar as it works, but does the same thing as cat.

matiu
  • 7,469
  • 4
  • 44
  • 48
  • 1
    This was the best answer for me because I have many .gz files broken into smaller files i.e. .gz.ae, .gz.ab. So, I just did "cat *gz* > final.gz" – zipline86 Apr 01 '21 at 22:12
  • 1
    I was concerned that the `*gz` would pick up the `final.gz` as well and do some weirdo cyclical thing, but now I know it expands the `*gz` at the start and turns it into one big command. eg `cat a.gz b.gz c.gz ... > final.gz` - so if `final.gz` doesn't exist at the beginning it won't get sucked in. – matiu Apr 05 '21 at 02:41
11

You can create a tar file of these files and then gzip the tar file to create the new gzip file

tar -cvf newcombined.tar file1.gz file2.gz file3.gz
gzip newcombined.tar
Drona
  • 6,886
  • 1
  • 29
  • 35
  • 9
    Why exactly should you gzip the new tar file? It's already zipped (apart from tar's metadata, which should be small). – thiton Nov 06 '11 at 19:07
  • 2
    You are right. There would not be much difference in the file size whether or not you gzip it because the individual files are already gzipped. It is just because he wanted to have gzip file out of the three individual files. – Drona Nov 06 '11 at 19:13
  • 3
    The extra gzip just slows down access to the content for no gain. It seems to me that the OPs requirement is really that the resultant archive be a single file, and there's no reason to suppose that the resultant file should be a gzip file. – mc0e Nov 24 '14 at 08:37