2

In bash, you can concatenate gzipped files and the result is a valid gzipped file. As far as I recall, I have always been able to treat these "concatenated" gzipped files as normal gzipped files (my example code from link above):

echo 'Hello world!' > hello.txt
echo 'Howdy world!' > howdy.txt
gzip hello.txt 
gzip howdy.txt

cat hello.txt.gz howdy.txt.gz > greetings.txt.gz

gunzip greetings.txt.gz

cat greetings.txt

Which outputs

Hello world!
Howdy world!

However, when trying to read this same file using Perl's core IO::Uncompress::Gunzip module, it doesn't get past the first original file. Here is the result:

./my_zcat greetings.txt.gz
Hello world!

Here is the code for my_zcat:

#!/bin/env perl
use strict;
use warnings;
use v5.10;

use IO::Uncompress::Gunzip qw($GunzipError);

my $file_name = shift;

my $fh = IO::Uncompress::Gunzip->new($file_name) or die $GunzipError;

while (defined(my $line = readline $fh))
{
    print $line;
}

If I totally decompress the files before creating a new gzipped file, I don't have this problem:

zcat hello.txt.gz howdy.txt.gz | gzip > greetings_via_zcat.txt.gz
./my_zcat greetings_via_zcat.txt.gz
Hello world!
Howdy world!

So, what is the difference between greetings.txt.gz and greetings_via_zcat.txt.gz and why might IO::Uncompress::Gunzip work correctly with greetings.txt.gz?

Based on this answer to another question, I'm guessing that IO::Uncompress::Gunzip messes up because of the metadata between the files. But, since greetings.txt.gz is a valid Gzip file, I would expect IO::Uncompress::Gunzip to work.

My workaround for now will be piping from zcat (which of course doesn't help Windows users much):

#!/bin/env perl
use strict;
use warnings;
use v5.10;

my $file_name = shift;

open(my $fh, '-|', "zcat $file_name");

while (defined(my $line = readline $fh))
{
    print $line;
}
Christopher Bottoms
  • 11,218
  • 8
  • 50
  • 99
  • 1
    For reference, this question has also been posted to the module's bug tracker: https://rt.cpan.org/Public/Bug/Display.html?id=119184 – melpomene Dec 08 '16 at 19:13
  • @melpomene Ticket marked as resolved since it is documented in [IO::Compress](https://metacpan.org/pod/distribution/IO-Compress/lib/IO/Compress/FAQ.pod#Dealing-with-concatenated-gzip-files) to use the `MultiStream` option to deal with this. – Christopher Bottoms Feb 01 '17 at 20:20

1 Answers1

3

This is covered explicitly in the IO::Compress FAQ section Dealing with concatenated gzip files. Basically you just have to include the MultiStream option when you construct the IO::Uncompress::Gunzip object.

Here is a definition of the MultiStream option:

MultiStream => 0|1

If the input file/buffer contains multiple compressed data streams, this option will uncompress the whole lot as a single data stream.

Defaults to 0.

So your code needs this change

my $fh = IO::Uncompress::Gunzip->new($file_name, MultiStream => 1) or die $GunzipError;
melpomene
  • 84,125
  • 8
  • 85
  • 148
pmqs
  • 3,066
  • 2
  • 13
  • 22
  • That is perfect. I'm laughing out loud right now, because I remember searching for `IO::Uncompress::Gunzip` and skipping over hits to `IO::Compress` documentation because that was a "different" module. – Christopher Bottoms Dec 08 '16 at 21:54