using gzip file as stdin for commands executed using subprocess.call

Question

I have a python script, executing multiple commands using subprocess.call(). I need to pass data from a gzipped file to one of those commands using stdin, but no matter what I do, the command apparently gets the gzipped data.

This is what I think should work:

import gzip
from subprocess import call

in_fname = 'test.gz'
out_fname = 'test.txt'

gz = gzip.open(in_fname, 'rb')
txt = open(out_fname, 'w')

call(['cat'], stdin=gz, stdout=txt)

But at the end, the 'test.txt' is compressed and has exactly the same size as the gzipped input file.

If I call gz.read() then I get the correct decompressed data, as expected. What do I need to do to use the gzipped file as stdin?

score 0 · Answer 1 · edited May 23 '17 at 12:34

After doing a bit of research, the root of the problem stems from the fact that your operating system has no idea that the filehandle for the gzipped file is anything special. Basically, gzip provides a "file-like" interface but the subprocess (cat in this case) has no idea that this is a special file that needs to be unzipped. Therefore, it just reads the file byte for byte and prints out the gibberish it reads.

My next idea was to read the whole file in python (which knows it's compressed and will unzip it) and then pass the string to the subprocess. I messed around with wrapping the unzipped contents in a StringIO object but that turns out not to work. Another answer (Use StringIO as stdin with Popen) mentioned a slightly different call to subprocess:

import gzip
from subprocess import Popen, PIPE

in_fname = 'test.gz'
out_fname = 'test.txt'

with gzip.open(in_fname, 'rb') as f:
  gz = f.read()
txt = open(out_fname, 'w')


process = Popen(['cat'], stdin=PIPE, stdout=txt)
process.communicate(gz)

Which works. Note that this requires reading the whole file into memory which may be a problem for really big files.

Yeah, special but not special :-/ It's a bit weird / unexpected that `gzip` provides file-like object which returns the compressed data - I wonder what would be the use case for that. Anyway, in the end I decided to simply call `gunzip` in one Popen, the other command in another Popen, and I read() from stdout of the first one and write() it to the other one. Reading the whole file into memory is not an option for me, as the files may be quite large. Also, this way the decompression is way faster than with decompression in Python. — JackieJack, May 01 '17 at 19:14

using gzip file as stdin for commands executed using subprocess.call

1 Answers1