0

I have been trying to use the seek function in a Perl script whose input file is in .gz format. I have opened the file using the following set of commands

if ($ARGV[0] =~ /.gz$/) {
open (FH1, "gunzip -c $ARGV[0] |") || die ("cant open file");
}
else {open (FH1, "<$ARGV[0]") || die ("cannot open file");
}

When seek function is used for a normal text file it is working fine, If is give a .gz file as input seek function is not working properly.

Is there any alternative for the seek function in this situation other than closing and opening the file wherever seek is used

  • 1
    Are you talking about [`seek`](https://perldoc.perl.org/functions/seek)? Seeking backwards in _any_ pipe stream is going to be hard :-) I suggest unpacking it into a temporary file and open that instead. – Ted Lyngmo Oct 19 '20 at 12:39
  • 1
    Seeking to a specific point in a `.gz` stream is impossible. This is an inherent limitation of the gzip format; see also [here](https://stackoverflow.com/questions/25985645/about-the-use-of-seek-on-gzip-files). The best you can do is start from the beginning and discard bytes until you reach your destination, or decompress the entire thing (into memory or onto disk) and do your random access there. – Thomas Oct 19 '20 at 12:39

1 Answers1

2

The core IO::Uncompress::Gunzip module has limited support for seek when used to read a gzipped file (Instead of using an external program like you're doing):

Provides a sub-set of the seek functionality, with the restriction that it is only legal to seek forward in the input file/buffer. It is a fatal error to attempt to seek backward.

Note that the implementation of seek in this module does not provide true random access to a compressed file/buffer. It works by uncompressing data from the current offset in the file/buffer until it reaches the uncompressed offset specified in the parameters to seek. For very small files this may be acceptable behaviour. For large files it may cause an unacceptable delay.

Shawn
  • 47,241
  • 3
  • 26
  • 60