4

I want to redirect the output of subprocess.call(...) to a xz- or bzip2-compressed file.

I tried :

with lzma.open(log_path, "x") as log_file:
    subprocess.call(command, stdout=log_file, stderr=log_file)

but the resulting file isn't a valid XZ-compressed file :

$ xzcat logfile.xz
xzcat : logfile.xz: Format de fichier inconnu

(which, in French, means "unknown file format").

When I use just cat, the file is displayed correctly, with some weird data at the end (the command launched in the script is rsync) :

& cat logfile.xz
sending incremental file list
prog/testfile

sent 531.80K bytes  received 2.71K bytes  1.07M bytes/sec
total size is 14.21G  speedup is 26,588.26
�7zXZ�ִF�D!��}YZ

logfile.xz seems to be a semi-valid XZ archive file, filled with uncompressed data. What am I doing wrong ?

PS : It works when I do something like that :

output = subprocess.check_output(command)
log_file.write(output)

...but given that the command takes a long time (it's a backup script), I want to be able to see the log (with xzcat) before the end, to know what rsync is doing.

Hey
  • 1,701
  • 5
  • 23
  • 43

1 Answers1

2

The redirection happens at the file descriptor level before the child is even executed: no parent code (related to child's stdout/stderr) is run after that (Python code from lzma module is not run).

To compress on the fly so that you could see the output while the child process is still running, you could redirect its output to xz utility:

#!/usr/bin/env python3
import subprocess

with open('logfile.xz', 'xb', 0) as log_file:
    subprocess.call("command | xz -kezc -", shell=True,
                    stdout=log_file, stderr=subprocess.STDOUT)

Note: an ordinary open() is used, not lzma.open(): the compression is done in the xz subprocess.


If you want to compress in pure Python code then you have to pipe the data through python:

#!/usr/bin/env python3
import lzma
from subprocess import Popen, PIPE, STDOUT
from shutil import copyfileobj

with lzma.open('logfile.xz', 'xb') as log_file, \
     Popen('command', stdout=PIPE, stderr=STDOUT) as process:
    copyfileobj(process.stdout, log_file)

Note: lzma.open() is used.

jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • Thanks ! This "no parent code" thing is weird. I chose to use pure Python. – Hey Dec 06 '15 at 17:19
  • @YdobEmos nothing weird, it is how pipelines and redirection work in the shell: `command | another >output.txt 2>&1` – jfs Dec 06 '15 at 17:28
  • I find it unintuitive, I would expect the data i send in an LZMA file to be compressed before it's written. (I was talking about your "no parent code (related to child's stdout/stderr) is run after that (Python code from lzma module is not run)" sentence). – Hey Dec 06 '15 at 17:33
  • yes, I'm talking about the same thing. Only `log_file.fileno()` result is used -- the file descriptor (an integer). It is clear that no compression occurs (no python code is run) if you know [how the redirection works (`dup2()`)](http://stackoverflow.com/a/22434262/4279). It is not Python-specific. It might be educational, to reimplement Popen using POSIX interface – jfs Dec 06 '15 at 23:30