How to avoid passing shell constructs to executable using Popen

Question

I am trying to call an executable called foo, and pass it some command line arguments. An external script calls into the executable and uses the following command:

./main/foo --config config_file 2>&1 | /usr/bin/tee temp.log

The script uses Popen to execute this command as follows:

from subprocess import Popen
from subprocess import PIPE
def run_command(command, returnObject=False):
    cmd = command.split(' ')
    print('%s' % cmd)
    p = None
    print('command : %s' % command)
    if returnObject:
        p = Popen(cmd)
    else:
        p = Popen(cmd)
        p.communicate()
        print('returncode: %s' % p.returncode)
        return p.returncode
    return p

command = "./main/foo --config config_file 2>&1 | /usr/bin/tee temp.log
"
run_command(command)

However, this passes extra arguments ['2>&1', '|', '/usr/bin/tee', 'temp.log'] to the foo executable.

How can I get rid of these extra arguments getting passed to foo while maintaining the functionality? I have tried shell=True but read about avoiding it for security purposes (shell injection attack). Looking for a neat solution.

Thanks

UPDATE: - Updated the file following the tee command

First say what you want to to with the extra arguments. Next say what your are trying to achieve: write in python a full shell with pipes and redirections or just a proof of concept. And finally, you should weight the risk for shell injection (can be mitigated by forcing a full path to a known shell) against the risk for bad implementation. But without a minimum context I cannot guess what you really want. — Serge Ballesta, Aug 11 '15 at 21:32
It is clear from the problem description what I want. I do not want to pass ['2>&1', '|', '/usr/bin/tee', '>temp.log'] to foo. — shank22, Aug 11 '15 at 21:38
Those arguments are shell constructs -- they have no meaning to anything **but** a shell. — Charles Duffy, Aug 11 '15 at 21:58
...also, calling `tee` without passing it any arguments is silly -- with no arguments, it reads from stdin and passes to stdout; instead of redirecting that stdout to a file, you might as well have just redirected the original program's stdout directly; in shell terms: `./main/foo --config config_file >temp.log 2>&1`? — Charles Duffy, Aug 11 '15 at 22:03
(to be clear, `>temp.log` is not an argument to tee; it tells a shell to direct FD 1 of the tee process to `temp.log` *before starting `tee` at all*). — Charles Duffy, Aug 11 '15 at 22:06
tee prints it to the console. What if I needed the output on the console too? — shank22, Aug 11 '15 at 22:35
@shank22, `tee >filename` **does not** print to the console; it only writes output to the file. Perhaps you're thinking of `tee filename`. — Charles Duffy, Aug 11 '15 at 22:36

Charles Duffy · Accepted Answer · 2015-08-27T19:13:01.957

3

The string

./main/foo --config config_file 2>&1 | /usr/bin/tee >temp.log

...is full of shell constructs. These have no meaning to anything without a shell in play. Thus, you have two options:

Set shell=True
Replace them with native Python code.

For instance, 2>&1 is the same thing as passing stderr=subprocess.STDOUT to Popen, and your tee -- since its output is redirected and it's passed no arguments -- could just be replaced with stdout=open('temp.log', 'w').

Thus:

p = subprocess.Popen(['./main/foo', '--config', 'config_file'],
  stderr=subprocess.STDOUT,
  stdout=open('temp.log', 'w'))

...or, if you really did want the tee command, but were just using it incorrectly (that is, if you wanted tee temp.log, not tee >temp.log):

p1 = subprocess.Popen(['./main/foo', '--config', 'config_file'],
  stderr=subprocess.STDOUT,
  stdout=subprocess.PIPE)
p2 = subprocess.Popen(['tee', 'temp.log'], stdin=p1.stdout)
p1.stdout.close() # drop our own handle so p2's stdin is the only handle on p1.stdout
stdout, _ = p2.communicate()

Wrapping this in a function, and checking success for both ends might look like:

def run():
    p1 = subprocess.Popen(['./main/foo', '--config', 'config_file'],
      stderr=subprocess.STDOUT,
      stdout=subprocess.PIPE)
    p2 = subprocess.Popen(['tee', 'temp.log'], stdin=p1.stdout)
    p1.stdout.close() # drop our own handle so p2's stdin is the only handle on p1.stdout
    # True if both processes were successful, False otherwise
    return (p2.wait() == 0 && p1.wait() == 0)

By the way -- if you want to use shell=True and return the exit status of foo, rather than tee, things get a bit more interesting. Consider the following:

p = subprocess.Popen(['bash', '-c', 'set -o pipefail; ' + command_str])

...the pipefail bash extension will force the shell to exit with the status of the first pipeline component to fail (and 0 if no components fail), rather than using only the exit status of the final component.

edited Aug 27 '15 at 19:13

answered Aug 11 '15 at 22:01

Charles Duffy

280,126
43
390
441

Just setting shell=True does not seem to redirect the output to the console. – shank22 Aug 11 '15 at 22:30
@shank22, of course it doesn't -- you're redirecting stdout from tee to a file. Why would you think it would go to the console as well? – Charles Duffy Aug 11 '15 at 22:31
@shank22, which is to say: If you want output to go to the console *as well as* the file, you'd have to do `tee temp.log`, not `tee >temp.log`, passing `temp.log` as an argument to tee, rather than passing tee no arguments but giving the shell a command to redirect its output. – Charles Duffy Aug 11 '15 at 22:31
How would it look like with shell=True? – shank22 Aug 11 '15 at 22:41
With `shell=True`, you wouldn't do any splitting yourself at all -- you'd just pass the whole input string straight to `subprocess.Popen()` as the first argument. – Charles Duffy Aug 11 '15 at 22:42
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/86738/discussion-between-shank22-and-charles-duffy). – shank22 Aug 11 '15 at 22:51
Shouldn't the ordering of the subprocesses be different? `p1 = subprocess.Popen(['tee', 'temp.log'], stdin=subprocess.PIPE) p2 = subprocess.Popen(['./main/foo', '--config', 'config_file'], stderr=subprocess.STDOUT, stdout=p1.stdin)` – shank22 Aug 12 '15 at 20:36
It certainly *could* be different; *should* is a judgment call. Actually, that's worth changing just for clarity. – Charles Duffy Aug 12 '15 at 20:40
Something like this? : `p = Popen(command.split(), stderr=subprocess.STDOUT, stdout=subprocess.PIPE) p1 = subprocess.Popen(['tee', file_name], stdin=p.stdout)` – shank22 Aug 12 '15 at 20:42
`p1.stdout.close()` would result in a different behavior compared to the original use case? What is the significance of that? Would it mean p1's stdout would not be seen on the console? – shank22 Aug 12 '15 at 20:47
It's a correctness fix. Without it, it's possible for p2 to hang waiting for an inbound EOF. No, it doesn't impact console output, which comes from p2, not p1. – Charles Duffy Aug 12 '15 at 20:51
So now p2 should be returned to caller instead of p? Would that affect the callers? – shank22 Aug 12 '15 at 21:00
That's not new -- that was correct usage for the first version too; I just didn't explicitly document either way. – Charles Duffy Aug 12 '15 at 21:01
Can you document it? Would be useful for anyone that reads it. – shank22 Aug 12 '15 at 21:02
...well, I say that, but it depends. What's the return value **actually used for**? If it's for checking return code, for instance, `p1.returncode` is arguably more important; whereas if it's used for retrieving output, one wants p2. – Charles Duffy Aug 12 '15 at 21:03
Frankly, I'd argue that the best practice is to actually run the pipeline from inside the function and return `(p1.returncode==0 && p2.returncode==0)`, unless there's a compelling reason to do otherwise. – Charles Duffy Aug 12 '15 at 21:04
The callers poll the original process (p) for completion. So what would be good to return in this case? – shank22 Aug 12 '15 at 21:05
If it's just about completion, p2 -- that way you'll only be notified after it's done flushing output to disk and console. If it's also about success (return code), then you want p1 as well, so you might return a tuple with both. – Charles Duffy Aug 12 '15 at 21:07
Its more than just completion. Different callers do different things, like polling for completion, killing it, etc. So since they have just one process, I would somehow have to map the 2 process to one when exiting the function. How can I gracefully do that? Or should I just use `shell=True`? – shank22 Aug 12 '15 at 21:10
@shank22, you can't even be sure that `shell=True` will be graceful in the use cases at hand without individually auditing them -- it'll mean you're returning the PID, and status, of the one shell, not of the two underlying processes in the pipeline, and depending on the behavior of that shell when signalled/polled/etc. to be what you want. – Charles Duffy Aug 12 '15 at 21:14
@shank22, ...it's not like situations where badly written code leaves subprocesses sitting around after it means to kill them because it only signals their parent processes are uncommon. Though for this specific example, how severe that is depends on the amount of output written by the first piece of the pipeline -- if you kill only one side, the other will notice (via a SIGPIPE) when it next tries to read or write from it. – Charles Duffy Aug 12 '15 at 21:15
So how do I handle this? Stuck with this legacy code and would not want to modify the callers. What would be a good solution? – shank22 Aug 12 '15 at 21:19
`shell=True`, probably. If it's buggy, the bugs can be blamed on your callers (for not understanding shell behavior or being cautious in pipeline construction) and/or the shell (for not doing what people mean rather than what they say). :) – Charles Duffy Aug 12 '15 at 21:19
Thanks Charles! Makes sense :) – shank22 Aug 12 '15 at 21:24
you could simplify the code if you [use `stdout=tee.stdin`](http://stackoverflow.com/a/31993624/4279) – jfs Aug 13 '15 at 16:25
How can I get the child process' exit code with `shell=True`? `shell=True` returns the shell process' exit code. – shank22 Aug 27 '15 at 18:16
The shell exits with the status of the last command it ran unless it's explicitly given an exit status. Even `exit`, standing alone with no arguments, behaves that way. – Charles Duffy Aug 27 '15 at 19:08
That said, for a pipeline, the exit status is that of the last thing it contains -- in this case, `tee`. If you want to get the status of the thing that fed into `tee`, you need to force `bash` rather than `sh` and use one of a few available extensions. I'll amend appropriately. – Charles Duffy Aug 27 '15 at 19:09
The problem is `shell=True` returns the pid of the shell process and not of the `foo` or `tee` process. Will using pipefail be equivalent to returning the pid of the foo process? – shank22 Aug 27 '15 at 21:15
Wait, you want the pid, not the exit status? Why does the pid matter? (And you **definitely** asked for exit status above). – Charles Duffy Aug 27 '15 at 21:18
My bad, I want the pid of foo with `shell=true`. I need the pid since the consumers use the pid and I do not have access to the consumers. – shank22 Aug 27 '15 at 21:20
Use it how? To check if it's still running? The shell's pid works just as well. To send it signals? If the signals are trappable ones, you can teach the shell to relay them. (*How* to do that is a separate question, but I'm 99% certain it's already asked and answered on this site). – Charles Duffy Aug 27 '15 at 21:34
I mean, there are definitely ways to get the pid of `foo`, but portable / non-hacky ways, not so much. – Charles Duffy Aug 27 '15 at 21:35

score 1 · Answer 2 · edited May 23 '17 at 12:19

1

Here's a couple of "neat" code examples in addition to the explanation from @Charles Duffy answer.

To run the shell command in Python:

#!/usr/bin/env python
from subprocess import check_call

check_call("./main/foo --config config_file 2>&1 | /usr/bin/tee temp.log",
           shell=True)

without the shell:

#!/usr/bin/env python
from subprocess import Popen, PIPE, STDOUT

tee = Popen(["/usr/bin/tee", "temp.log"], stdin=PIPE)
foo = Popen("./main/foo --config config_file".split(), 
            stdout=tee.stdin, stderr=STDOUT)
pipestatus = [foo.wait(), tee.wait()]

Note: don't use "command arg".split() with non-literal strings.

See How do I use subprocess.Popen to connect multiple processes by pipes?

edited May 23 '17 at 12:19

Community

1
1

answered Aug 13 '15 at 16:23

jfs

399,953
195
994
1,670

This is where I grumble about the use of `string.split()` -- creating bugs if handling arguments with literal spaces (or with code containing shell quoting, escaping or redirection directives, all of which accepting a string can be read to imply support for); at least `shlex.split()` doesn't restrict the range of possible commands to exclude those with literal whitespace in arguments. But otherwise, the further examples are appreciated. – Charles Duffy Aug 13 '15 at 16:45
@CharlesDuffy: there is nothing wrong with using `.split()` for *literal* strings – jfs Aug 13 '15 at 16:46
What's wrong is that it becomes habit, and gets used in other scenarios as well -- such as the OP's original code above. – Charles Duffy Aug 13 '15 at 16:47
@CharlesDuffy: I don't buy it unless it is backed up by evidence. It reminds me about "gateway drug", "games cause violence", etc that sound plausible but that are bogus in reality. You can't accidently confuse a literal and non-literal strings. [`shlex.split()` is not a panacea](http://stackoverflow.com/questions/28468807/python-executing-a-shell-command/28468860#comment45287698_28468860). You have to correct its results manually (though it it useful for creating a "draft" list of arguments). – jfs Aug 13 '15 at 17:15
The OP's misuse *in the question here* -- wherein they have a reusable function used by callers they don't control -- is not example enough?! Keep in mind that your example above was not explicitly commenting or noting that it was demonstrating a practice safe only with literals. If someone is copying practices espoused by high-rep users on StackOverflow -- something we should support, should we not -- then can they not expect to be able to use that practice widely, unless it's presented next to an explicit warning or disclaimer limiting its applicability? – Charles Duffy Aug 13 '15 at 17:18
Moreover: I agree that one can't accidentally confuse literal and non-literal strings, but one most certainly can write code with one use case in mind, and then have that code modified or reused without a detailed audit of whether it uses practices which are only safe in a limited set of scenarios. – Charles Duffy Aug 13 '15 at 17:19
@CharlesDuffy: OP's example uses a **variable**. btw, your answer explains why OP's model (passing `[..., '2>&1', '|',]`) is not correct. I've added the explicit warning anyway. – jfs Aug 13 '15 at 17:58
OP's example takes a *parameter*. That the parameter happens to be a constant in the example given is by no means reason to believe that it's constant in every actual use case, as they've indicated in comments that it has numerous callers they don't control. That said, I'm happy with the warning. – Charles Duffy Aug 13 '15 at 18:01
`command` in the question is a variable that is also a parameter. How `command` is created does not matter. It is a variable in the `run_command()` function. – jfs Aug 13 '15 at 18:04
Whether it's guaranteed to be a value hand-written and reviewed by a human familiar with the semantics of `run_command()` and how they vary from shell semantics matters very much -- and as the OP has indicated in comments that they don't control all the code calling this function... – Charles Duffy Aug 13 '15 at 18:05
Then I entirely fail to grasp it. – Charles Duffy Aug 13 '15 at 18:06
...are you perhaps distinguishing between a variable populated from a literal and the literal itself? (I was not, which is why I took "OP's example uses a **variable**" as to not contradict the position I thought you were arguing from). – Charles Duffy Aug 13 '15 at 18:10
@CharlesDuffy: `"this is literal"`, `this_is_variable` – jfs Aug 13 '15 at 18:22
Obviously so -- but if I took you at your literal word (ha!), that would be contradicting the applicability of your answer to the question, so I opted for a more charitable interpretation. – Charles Duffy Aug 13 '15 at 18:23
@CharlesDuffy: yes, my answer does not contain a general shell parser (to generate Python code for `2>&1`, `|` automatically) i.e., none of the answers including yours do it (or should do it). – jfs Aug 13 '15 at 18:38

AnFi · Answer 3 · 2015-08-11T22:31:32.197

0

You may combine answers to two StackOverflow questions:
1. piping together several subprocesses
x | y problem
2. Merging a Python script's subprocess' stdout and stderr (while keeping them distinguishable)
2>&1 problem

edited Aug 11 '15 at 22:31

answered Aug 11 '15 at 21:34

AnFi

10,493
3
23
47

For the current use case, would setting shell=True be bad? ([piping together several subprocesses](http://stackoverflow.com/questions/6341451/piping-together-several-subprocesses)) – shank22 Aug 11 '15 at 21:46
Also, I am interested in knowing how this can be leveraged with the current command. I only have access to run_command method to make the changes. – shank22 Aug 11 '15 at 21:56
It depends. The command you included in your question has no "variable" parts - security is not big risk (IMHO). Running every part of pipeline directly from python gives you exit codes of every program/command in the chain. So it is your choice in case simplicity v. full/detailed control. [I have assumed you provided "sample" command => command you really want may be different] – AnFi Aug 11 '15 at 21:58
Short question: Are any of these components user-configurable? The config file name? The log name? Anything else? – Charles Duffy Aug 11 '15 at 21:59
The command is not user configurable when it comes to run_command. It is a string that is just split and given to Popen. So wondering how it can be done. – shank22 Aug 11 '15 at 22:03
Why make it a string in the first place, and not define the command as a list to start with? Having your Python code do string-splitting when you could just hardcode the correct split to start with, and eliminate the relevant classes of errors (splitting on whitespace within names, f'rinstance). – Charles Duffy Aug 11 '15 at 22:05
Re: links given, `2>&1` doesn't keep stderr and stdout distinguishable, so the linked answer doesn't seem on point. – Charles Duffy Aug 11 '15 at 22:07
@CharlesDuffy As I understand: The linked question is not on point. The answer is on point :-) – AnFi Aug 11 '15 at 22:30

How to avoid passing shell constructs to executable using Popen

3 Answers3