7

I was researching on whether or not Python can replace Bash for shell scripting purposes. I have seen that Python can execute Linux commands using subprocess.call() or os.system(). But I've read somewhere (forgot the link of the article) that using these is a bad thing. Is this really true?

If yes, then why is it a bad thing?

If not, then is it safe to say that Python can indeed replace Bash for scripting since I could just execute Linux commands using either of the 2 function calls?

Note: If I'm not mistaken, os.system() is deprecated and subprocess.call() should be used instead but that is not the main point of the question.

Patrick
  • 191
  • 2
  • 8
  • 1
    There's nothing wrong with using the subprocess module. Just try to [avoid setting `shell=True`](https://stackoverflow.com/a/3172488/1222951). – Aran-Fey Jun 09 '17 at 14:03
  • We'd **really** need to read the article to be able to refute it. Without a specific claim, this is pretty vague. – Charles Duffy Jun 09 '17 at 14:04
  • ...that said, Python *absolutely* can do everything bash can do, and much of it better. What it *can't* do in many cases is do those things comparably tersely without compromising correctness: In bash, if you want to pass a variable's value as a literal, all you need to do is ensure that the expansion is quoted. In Python, if you're running with `shell=True`, you need to invoke `pipes.quote()` to get comparable safety, or do a more verbose invocation with `shell=False`. – Charles Duffy Jun 09 '17 at 14:05
  • 1
    It's a reasonable claim if it's saying "executing commands with `system` is bad _if you could get the same behavior without using `system`_". For example, `os.listdir` is preferable to `subprocess.check_output("ls")` if all you want is a list of filenames. – Kevin Jun 09 '17 at 14:07
  • @Kevin, ...I'd argue that the C standard library call `system()`, like `os.system()` -- inasmuch as both call `sh -c "string"` -- is bad *always*, full-stop, no matter what. If one is going to be executing external commands, the Right Way is an `execv`-family syscall with an explicit argument list. Inasmuch as `system()` is thus an unnecessarily error-prone way of executing external commands, it's better avoided in terms of preferable ones, *even when use of an external command is itself unavoidable*. – Charles Duffy Jun 09 '17 at 14:08
  • @Kevin, ...that said, `subprocess.check_output("ls")` is *not* actually calling `system()` -- with the default `shell=False`, it's literally directly `exec`ing the first instance of `ls`, with no intervening shell. Bad practice for other reasons, of course. – Charles Duffy Jun 09 '17 at 14:12
  • Take a look at [xonsh](http://xon.sh/index.html) — Python-powered shell. – phd Jun 09 '17 at 14:15
  • @phd, ...heh. Actually, for someone wanting to "replace" the shell -- ie. have an utterly non-shell language -- that might be a good idea. For someone who wants a "better" shell, I utterly bristle at the idea of encouraging non-POSIX-compliant shells -- the language is baroque and arcane, but it's a very well-understood and widespread kind of baroque and arcane, and anyone who knows the rules can write correct scripts for any shell in the family. By contrast, folks who get too accustomed to fish or zsh (f/e) lose their ability to write correct scripts in bash/ksh/sh. – Charles Duffy Jun 09 '17 at 14:19
  • @CharlesDuffy I just wanted to point that it's perfectly valid idea to call shell from Python and to write a shell in Python. As for the rest of your complain — personally I use bash. :-) – phd Jun 09 '17 at 15:57

2 Answers2

3

Using os.system() or subprocess.call(..., shell=True) can't replace a shell, because they actually use the shell.

os.system("foo") actually invokes sh -c "foo" -- that is to say, it runs foo as a shell script. Using this, then, is in no respect replacing a shell. This is also true in the exact same way for subprocess.call("foo", shell=True).


Using subprocess.Popen and functions from that family can replace a shell, but this often results in verbose and unwieldy code.

Consider the following shell script:

#!/bin/sh
foo "$1" | bar "$2"

Now, let's look at what it takes to reproduce that in Python in a way that doesn't start any shell under-the-hood:

#!/usr/bin/env python
import subprocess, sys

p1 = subprocess.Popen(["foo", sys.argv[1]], stdout=subprocess.PIPE)
p2 = subprocess.Popen(["bar", sys.argv[2]], stdin=p1.stdout)
sys.exit(p2.wait())

We went from 19 characters (after the shebang) to 148 characters (after the shebang and imports) -- and this was for a completely trivial script, one not using fancier features such as process substitution, command substitution, or the like.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • What if we used shell=False instead, does that still invoke the shell? – Patrick Jun 09 '17 at 14:33
  • @Patrick, with `shell=False` (which is the default!), no shell is invoked. – Charles Duffy Jun 09 '17 at 14:34
  • So if I use shell=False, doesn't that make your first argument invalid since I would not be invoking the shell? – Patrick Jun 09 '17 at 14:36
  • 1
    @Patrick, that's why I explicitly scoped my argument to only apply with `shell=True`. If I say "if X then Y", and you say "but when not-X then not-Y", that doesn't "invalidate" anything. It *does* make that argument *inapplicable* to such cases, but most of the ways to avoid the verbosity mentioned in the second argument require `shell=True` or an equivalent. The first argument is thus there principally to foreclose a specific possible objection to the latter. – Charles Duffy Jun 09 '17 at 14:44
  • I'm sorry, "invalidate" was the wrong choice of words. I would like to know why there is a need for the verbose version instead of just using shell=False? Does it have any advantages? – Patrick Jun 09 '17 at 14:57
  • What exactly do you mean by "just using `shell=False'`"? If you try to do `subprocess.Popen('foo | bar', shell=False)` or `subprocess.Popen(['foo', '|', 'bar'], shell=False)` it **won't work** (well, won't work the way you want it to -- the latter passes `|` and `bar` as arguments to `foo`), because `|` is a shell construct, and if you don't have a shell, then shell constructs aren't available to you. – Charles Duffy Jun 09 '17 at 14:59
  • I am referring to using `subprocess.call('foo | bar')`, without using the `shell=True` argument. – Patrick Jun 09 '17 at 15:02
  • `subprocess.call('foo | bar')` is looking for a command named `foo | bar`, with the space and the pipe as part of the filename. So it'll look for `/bin/foo | bar`, `/usr/bin/foo | bar`, `/usr/local/bin/foo | bar`, and when it doesn't find any executable named `foo | bar` it'll give you back an error. – Charles Duffy Jun 09 '17 at 15:03
  • Splitting `foo | bar` into two separate commands, one named `foo` and the other named `bar` with the output of the first passed as the input to the second is something shells do. No shell == that doesn't get done, unless you do it yourself by writing the longer/hairier Python version. – Charles Duffy Jun 09 '17 at 15:04
  • Alright, now I get. Thank you. I can definitely see that are still some situations wherein bash might be useful. – Patrick Jun 09 '17 at 15:19
2

In general it is not a bad thing to create another process from your own process. People do this constantly on the bash.

However, one always should ask oneself what is the best environment to do the task you need to do. For instance I could easily call a python script to cut (the linux tool) a column from a file. However, the overhead to first open the python interpreter, then save the output from cut, and then save that again is possibly higher than checking how to use the bash-tool with man.

However, collecting output from another "serious" program to do further calculations on that output, yes, you can do that nicely with subprocesses (though I would opt for storing that output in a file and then just read in the file if I need to rerun my script).

And this is where launching a subprocess may get tricky: depending on how you open a new subprocess, you can not rely anymore on environment variables. Especially when dealing with large input data, the output from the subprocess does not get piped further and therefore is collected in memory until the program finished, which might lead into a memory problem.

To put it short: if using python solves your problem faster than combining bash-only tools, sure, do it. If that involves launching serious subprocesses, ok. However, if you want to replace bash with python, do not do that.

mjoppich
  • 3,207
  • 1
  • 11
  • 13
  • Eh? The "collected in memory" problem is exactly the same whether you're using `p.communicate()` in Python or `output=$(command)` in shell. And I don't follow your argument about environment variables at all -- they're treated exactly the same way in both cases. Could you provide some code samples illustrating your arguments? – Charles Duffy Jun 09 '17 at 14:30
  • If $SOMEDIR was exported in bashrc, `subprocess.call(["/usr/bin/ls", "$SOMEDIR"], shell=True)` would not know about that (unless you specify it as environment). Usually one pipes output from `$(command)` within bash (scripts) to the next script, or directly writes it out to some file and then uses that other file as input. (doing this within python can get ugly, reason why you should always use the solution that suits your problem best) Memory-wise, of course, if you store data, you store it. Still,in python you possibly will save more because you "need to process it in the same script". – mjoppich Jun 09 '17 at 15:26
  • Eh? `subprocess.call(["/usr/bin/ls", "$SOMEDIR"], shell=True)` doesn't pass `$SOMEDIR` to `ls` at all, so that's just broken code -- it doesn't say anything about anything. – Charles Duffy Jun 09 '17 at 15:38
  • And the correct version, `subprocess.call(['/usr/bin/ls "$SOMEDIR"'], shell=True)` actually **does** interpret the environment variable in question. – Charles Duffy Jun 09 '17 at 15:38
  • While `echo $SOMEDIR -> /home/proj/` returns that directory nicely, `subprocess.call(['/usr/bin/ls "$SFBDATA"'], shell=True)` returns `/usr/bin/ls: cannot access '': No such file or directory` on my system. Btw, `subprocess.call(["ls", "-hl", "/tmp"], shell=False)` will nicely return the content of `tmp` ... – mjoppich Jun 09 '17 at 15:47