0

I'm trying to run a long bash command with several pipes inside. I did some research and found the use of os.chdir() to change to a working directory, and subprocess to execute bash commands.

First I need to change into a directory that contains the file sample.txt

os.chdir(path)

This path will contain the file sample.txt

Then I ran the bash command using subprocess (for the sed command, I'll just use something for example):

p = subprocess.Popen("cat sample.txt | grep 'patternA'  | sed 'something' >> ~/out.txt", stdout=subprocess.PIPE, shell=True)
print p.communicate()

I got ('', None) printed out, and also when I checked the ~/out.txt file, there's nothing printed as expected with the bash command given (assuming patternA exists in file sample.txt)

Is this the right way to use subprocess to run bash commands in pythong? Thanks for any help!

Christina Do
  • 11
  • 1
  • 1
  • Hmm for now I'm just a bit confused in terms of how to capture say the output of "cat sample.txt" and pipe that output to "grep 'patternA' then capture the new output to a file, all using subprocess... – Christina Do Jan 27 '22 at 23:18

2 Answers2

3

There is a number of minor problems with your code, but on the whole, yes, this is how you run a subprocess.

  • Prefer subprocess.run over bare Popen when you can. This would also avoid the behavior you regard as confusing; see the next bullet point.
  • As such, if you do run Popen and then communicate, the output from that is a tuple with two values, the standard output and the standard error of the finished process.
  • But you are redirecting standard output to a file, so of course stdout=subprocess.PIPE is going to be unnecessary and produce nothing. (Because you don't capture stderr with stderr=subprocess.PIPE, it ends up containing None; if there is any error output, it is simply displayed to the user, and invisible to Python.)
  • Your shell script is horribly overcomplicated. Reducing it to a single process would avoid the need for shell=True which is generally something you should strive for.
  • But even more fundamentally, the script could be reimplemented in native Python, which would make it both more versatile, as well as easier to understand for anyone who is not familiar with both shell script and Python. (Granted, the shell formulation would be a lot more succinct, at least after refactoring.)

The obvious Python implementation would look like

from pathlib import Path
...
with open("sample.txt", "r") as lines, \
        Path("~/out.txt").expanduser().open("w") as output:
    for line in lines:
        if "patternA" in line:
            output.write(line.replace('foo', 'bar'))

where obviously we have to guess wildly at what your sed script actually does, as you have replaced it with a placeholder.

The same with subprocess.run and avoiding the shell programming antipatterns,

from pathlib import Path
...
with Path("~/out.txt").expanduser.open("w") as output:
    subprocess.run(
        ['sed', '/patternA/something', 'sample.txt'],
        stdout=output, text=True, check=True)

You want to avoid the [useless cat](useless use of cat and the useless grep; and with those out of the way, you don't need a pipeline, and thus no shell.

If you want to retrieve status information from the subprocess, assign the result from subprocess.run to a variable you can examine, say r; the error status will be in r.resultcode (though with check=True it's guaranteed to be 0).

Python won't let you mix capture_output=True with stdout=... and/or stderr=... so if you want to see whether there is error output (there could be a warning message from some tools even when they succeed) you have to split the operation. Perhaps like this:

import logging
from pathlib import Path
...
r = subprocess.run(
    ['sed', '/patternA/something', 'sample.txt'],
    capture_output=True, text=True, check=True)
with Path("~/out.txt").expanduser().open("w") as output:
    output.write(r.stdout)
if r.stderr:
    logging.warn(r.stderr)

As a final aside, os.path.expanduser() or pathlib.Path.expanduser() are necessary to resolve ~/out.txt to a file in your home directory. You should generally never need to os.chdir() to find a file; just specify its path name if it's not in the current directory. See also What exactly is current working directory?

tripleee
  • 175,061
  • 34
  • 275
  • 318
  • For (much) more on these topics, see also [my answer to the near identically titled question _Running Bash commands in Python_](https://stackoverflow.com/questions/4256107/running-bash-commands-in-python/51950538#51950538) – tripleee Jan 28 '22 at 07:27
1

Your code:

import os, subprocess

with open("sample.txt", "w") as f:
    print("patternA1\npatternA2\nhello\nhi", file=f)

p = subprocess.Popen(
    "cat sample.txt | grep 'patternA' | sed 's/A/B/' > out.txt", 
    stdout=subprocess.PIPE, shell=True)

out.txt has been populated.

>>> p.communicate()
(b'', None)
>>> print(open('out.txt').read(), end='')
patternB1
patternB2

As you're using the shell and redirecting the output directly to a file - you're essentially doing an os.system() call.

>>> os.remove('out.txt')
>>> os.system("cat sample.txt | grep 'patternA' | sed 's/A/B/' > out.txt")
0
>>> print(open('out.txt').read(), end='')
patternB1
patternB2
  • Like the documentation already says, you should probably prefer `subprocess` over `os.system`, though probably not specifically `subprocess.Popen` if you can use one of the higher-level functions like `subprocess.check_call()`or `subprocess.run()`. The former is only a few characters longer than `os.system()` but provides several benefits. – tripleee Jan 28 '22 at 11:22