22

I can run this normally on the command line in Linux:

$ tar c my_dir | md5sum

But when I try to call it with Python I get an error:

>>> subprocess.Popen(['tar','-c','my_dir','|','md5sum'],shell=True)
<subprocess.Popen object at 0x26c0550>
>>> tar: You must specify one of the `-Acdtrux' or `--test-label'  options
Try `tar --help' or `tar --usage' for more information.
Greg
  • 45,306
  • 89
  • 231
  • 297
  • 1
    Why are you hashing a tar file? Do you mean to be looking for changes in file contents? or verify an externally created tar file? – tMC Sep 06 '11 at 18:27
  • Perhaps see also https://stackoverflow.com/questions/24306205/file-not-found-error-when-launching-a-subprocess-containing-piped-commands – tripleee Feb 07 '21 at 09:22
  • @tMC: and how does this comment help with the actual problem and question ??? – Swifty Jan 11 '23 at 11:30

5 Answers5

22

You have to use subprocess.PIPE, also, to split the command, you should use shlex.split() to prevent strange behaviours in some cases:

from subprocess import Popen, PIPE
from shlex import split
p1 = Popen(split("tar -c mydir"), stdout=PIPE)
p2 = Popen(split("md5sum"), stdin=p1.stdout)

But to make an archive and generate its checksum, you should use Python built-in modules tarfile and hashlib instead of calling shell commands.

mdeous
  • 17,513
  • 7
  • 56
  • 60
  • tarfile, and hashlib would be preferable. But how do I hash a tarfile object? – Greg Sep 06 '11 at 17:55
  • 1
    @Greg don't hash the tarfile object, open the resulting file like any other file using `open()` and then hash its content. – mdeous Sep 06 '11 at 18:02
  • Makes sense. That works but I get a different hash value than from the original command. Is that to be expected? – Greg Sep 06 '11 at 18:17
  • 1
    @Greg, this should do the same exact thing as `tar -c mydir | md5sum`. Perhaps you could start a new question, including an interactive terminal session where you run this command, start Python, and run the Python commands, displaying the output. – Mike Graham Sep 06 '11 at 18:49
  • Perhaps also mention that you have to call `communicate` on the final `Popen` object, or switch to a modern wrapper like `subprocess.run`. For many cases, simply pass in a string with `shell=True` if you want to use shell features like pipes, variables, redirection, job control, etc. Or as the answer suggests, run as little as possible in a subprocess and replace shell commands with native Python where you can (in which case you can often avoid [the security implications of `shell=True`](https://stackoverflow.com/questions/3172470/actual-meaning-of-shell-true-in-subprocess) by removing it). – tripleee Dec 22 '22 at 09:57
10

Ok, I'm not sure why but this seems to work:

subprocess.call("tar c my_dir | md5sum",shell=True)

Anyone know why the original code doesn't work?

Greg
  • 45,306
  • 89
  • 231
  • 297
  • 2
    the pipe | is a character the shell understands to connect command inputs and outputs together. It is not an argument that tar understands, nor a command. You're trying to execute everything as arguments to the tar command, unless you create a subshell. – tMC Sep 06 '11 at 17:51
  • 4
    The works because the entire command is passed to the *shell* and the *shell* understands the `|`. Popen calls the process and passes in the arguments directly. For Popen this is controlled with `shell=` and passing a string (not a list), IIRC. –  Sep 06 '11 at 17:52
4

What you actually want is to run a shell subprocess with the shell command as a parameter:

>>> subprocess.Popen(['sh', '-c', 'echo hi | md5sum'], stdout=subprocess.PIPE).communicate()
('764efa883dda1e11db47671c4a3bbd9e  -\n', None)
Dag
  • 682
  • 5
  • 10
2

i would try your on python v3.8.10 :

import subprocess
proc1 = subprocess.run(['tar c my_dir'], stdout=subprocess.PIPE, shell=True)
proc2 = subprocess.run(['md5sum'], input=proc1.stdout, stdout=subprocess.PIPE, shell=True)
print(proc2.stdout.decode())

key points (like outline in my solution on related https://stackoverflow.com/a/68323133/12361522):

  • subprocess.run()
  • no splits of bash command and parameters, i.e. ['tar c my_dir']or ["tar c my_dir"]
  • stdout=subprocess.PIPE for all processes
  • input=proc1.stdout chain of output of previous one into input of the next one
  • enable shell shell=True
  • This is basically just a restatement of the accepted answer. The use of `run` over `Popen` is a good idea when you can, of course (back when the accepted answer was written, `run` didn't exist). – tripleee Dec 22 '22 at 09:52
  • thanks for posting this example, i needed to have `grep` in my command string, which did weird stuff when being supplied to `split`. – kiltek Feb 03 '23 at 12:02
  • I would prefer this over the accepted answer due to using `run` instead of `shlex` – WolfLink Jul 06 '23 at 05:49
1
>>> from subprocess import Popen,PIPE
>>> import hashlib
>>> proc = Popen(['tar','-c','/etc/hosts'], stdout=PIPE)
>>> stdout, stderr = proc.communicate()
>>> hashlib.md5(stdout).hexdigest()
'a13061c76e2c9366282412f455460889'
>>> 
tMC
  • 18,105
  • 14
  • 62
  • 98