3

Simple question that I can't find an answer to:

Is there a benefit of using os.mkdir("somedir") over os.system("mkdir somedir") or subprocess.call(), beyond code portability?

Answers should apply to Python 2.7.

Edit: the point was raised that a hard-coded directory versus a variable (possibly containing user-defined data) introduces the question of security. My original question was intended to be from a system approach (i.e. what's going on under the hood) but security concerns are a valid issue and should be included when considering a complete answer, as well as directory names containing spaces

TheOx
  • 2,208
  • 25
  • 28
  • 1
    Well, `somedir` is a variable in the first case. – Peter Wood Jun 15 '15 at 20:34
  • 2
    One difference is `os.mkdir` will raise an exception if something goes wrong, while you'll have to check the output and/or return value otherwise. – Aereaux Jun 15 '15 at 20:35
  • The return value isn't good enough. Quoting from the manual: "DIAGNOSTICS The mkdir utility exits 0 on success, and >0 if an error occurs." -- it doesn't tell you *what kind* of error. – Charles Duffy Jun 15 '15 at 20:44
  • somedir being a variable doesn't change the question but I've edited it anyway – TheOx Jun 15 '15 at 20:44
  • @TheOx, per your edits, you mean to ask this only with a hardcoded variable name? It *does* change the question, because names you don't control have more issues (security ones, for instance). If you want a maximally general answer, you should ask with the name being a variable in all cases. – Charles Duffy Jun 15 '15 at 20:47
  • 1
    @CharlesDuffy The focus of the question was meant to be from a system standpoint (i.e. the underlying mechanisms behind the various options) but security issues for user-defined input are most definitely valid and I should have more clearly defined the question. – TheOx Jun 15 '15 at 20:50
  • @TheOx It does make a difference that it's a variable. The value would need quoting and escaping correctly if you passed it to `os.system`. It's more difficult to use correctly. – Peter Wood Jun 15 '15 at 21:49

1 Answers1

14

Correctness

Think about what happens if your directory name contains spaces:

mkdir hello world

...creates two directories, hello and world. And if you just blindly substitute in quotes, that won't work if your filename contains that quoting type:

'mkdir "' + somedir + '"'

...does very little good when somedir contains hello "cruel world".d.


Security

In the case of:

os.system('mkdir somedir')

...consider what happens if the variable you're substituting for somedir is called ./$(rm -rf /)/hello.

Also, calling os.system() (or subprocess.call() with shell=True) invokes a shell, which means that you can be open to bugs such as ShellShock; if your /bin/sh were provided by a ShellShock-vulnerable bash, and your code provided any mechanism for arbitrary environment variables to be present (as is the case with HTTP headers via CGI), this would provide an opportunity for code injection.


Performance

os.system('mkdir somedir')

...starts a shell:

/bin/sh -c 'mkdir somedir'

...which then needs to be linked and loaded; needs to parse its arguments; and needs to invoke the external command mkdir (meaning another link and load cycle).


A significant improvement is the following:

subprocess.call(['mkdir', '--', somedir], shell=False)

...which only invokes the external mkdir command, with no shell; however, as it involves a fork()/exec() cycle, this is still a significant performance penalty over the C-library mkdir() call.

In the case of os.mkdir(somedir), the Python interpreter directly invokes the appropriate syscall -- no external commands at all.


Error Handling

If you call os.mkdir('somedir') and it fails, you get an IOError with the appropriate errno thrown, and can trivially determine the type of the error.

If the mkdir external command fails, you get a failed exit status, but no handle on the actual underlying problem without parsing its stderr (which is written for humans, not machine readability, and which will vary in contents depending on the system's current locale).

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • Also, if the directory path has spaces in it, the path needs quoting. And possibly escaping too. – Peter Wood Jun 15 '15 at 20:44
  • 1
    @PeterWood, quite right. Added "correctness" as its own section. – Charles Duffy Jun 15 '15 at 20:46
  • @CharlesDuffy This is a great answer. I feel I've learned a lot about how to answer questions well. – Peter Wood Jun 15 '15 at 20:49
  • 4
    What about cross platform compatibility? `os.system("mkdir...` might not be the right command to create a folder on the underlying OS, and the filesystem/shell quoting rules might be different. http://stackoverflow.com/a/16617835/478656 has Martijn Peters' explanation of os.mkdir on Linux / Windows and links to the CPython source code. – TessellatingHeckler Jun 15 '15 at 20:53
  • 2
    @TessellatingHeckler, indeed. The OP acknowledged portability as an issue in their question, however, so that one was already known / on the table. (Thank you for the link -- that's a useful addition!) – Charles Duffy Jun 15 '15 at 20:54