Why we must use a list in subprocess.Popen?

Question

My question is more theoretical than practical, I've found more answers that explains how but not why should we use a list in a subprocess.Popen call.

For example as is known:

Python 2.7.10 (default, Oct 14 2015, 16:09:02)
[GCC 5.2.1 20151010] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import subprocess
>>> cmd = subprocess.Popen(["python", "-V"], stdout=subprocess.PIPE)
Python 2.7.10

Then I was messing around in UNIX and found something interesting:

mvarge@ubuntu:~$ strace -f python -V 2>&1
execve("/usr/bin/python", ["python", "-V"], [/* 29 vars */]) = 0

Probably both execve and the list model that subprocess use are someway related, but can anyone give a good explanation for this?

Thanks in advance.

It's convenient, for one thing. When you have an argument that contains a space and two different kinds of quotation marks, it requires quite a bit to be able to put it in a string in such a way that Bash will treat it like one argument. When it is in a list, `Popen` will take care of that for you. — zondo, Mar 31 '16 at 12:20
As an aside, you should not normally use `subprocess.Popen()` - it happens to work in some cases on some platforms, but in the general case, it only *starts* a subprocess, and you need several additional interactions to then properly run and shutdown the process. The wrappers in the `subprocess` library take care of this, and shield you from the underlying complexities - only when they are inadequate for your needs should you turn to the underlying workhorse functions. In this case, `subprocess.call()` would be the tool of choice. — tripleee, Mar 31 '16 at 13:41

tripleee · Answer 1 · 2016-03-31T12:44:21.313

The underlying C-level representation is a *char [] array. Representing this as a list in Python is just a very natural and transparent mapping.

You can use a string instead of a list with shell=True; the shell is then responsible for parsing the command line into a * char [] array. However, the shell adds a number of pesky complexities; see the many questions for why you want to avoid shell=True for a detailed explanation.

The command line arguments argv and the environment envp are just two of many OS-level structures which are essentially a null-terminated arrays of strings.

jfs · Answer 2 · 2016-03-31T18:58:44.723

A process is an OS level abstraction — to create a process, you have to use OS API that dictates what you should use. It is not necessary to use a list e.g., a string (lpCommandLine) is the native interface on Windows (CreateProcess()). POSIX uses execv() and therefore the native interface is a sequence of arguments (argv). Naturally, subprocess Python module uses these interfaces to run external commands (create new processes).

The technical (uninsteresting) answer is that in "why we must", the "must" part is not correct as Windows demonstrates.

To understand "why it is", you could ask the creators of CreateProcess(), execv() functions.

To understand "why we should" use a list, look at the table of contents for Unix (list) and Windows (string): How Command Line Parameters Are Parsed — the task that should be simple is complicated on Windows.

The main difference is that on POSIX the caller is responsible for splitting a command line into separate parameters. While on Windows the command itself parses its parameters. Different programs may and do use different algorithms to parse the parameters. subprocess module uses MS C runtime rules (subprocess.list2cmdline()), to combine args list into the command line. It is much harder for a programmer to understand how the parameters might be parsed on Windows.

Why we must use a list in subprocess.Popen?

2 Answers2