12

When a script is invoked explicitly with python, the argv is mucked with so that argv[0] is the path to the script being run. This is the case if invoked as python foo/bar.py or even as python -m foo.bar.

I need a way to recover the original argv (ie. the one received by python). Unfortunately, it's not as easy as prepending sys.executable to sys.argv because python foo/bar.py is different than python -m foo.bar (the implicit PYTHONPATH differs, which can be crucial depending on your module structure).

More specifically in the cases of python foo/bar.py some other args and python -m foo.bar some other args, I'm looking to recover ['python', 'foo/bar.py', 'some', 'other', 'args'] and ['python', '-m', 'foo.bar', 'some', 'other', 'args'], respectively.

I am aware of prior questions about this:

But these seem to have a misunderstanding of how shells work and the answers reflect this. I am not interested in undoing the work of the shell (eg. evaluated shell vars and functions are fine), I just want to get at the original argv given to python.

The only solution I've found is to use /proc/<PID>/cmdline:

import os
with open("/proc/{}/cmdline".format(os.getpid()), 'rb') as f:
  original_argv = f.read().split('\0')[:-1]

This does work, but it is Linux-only (no OSX, and Windows support seems to require installing the wmi package). Fortunately for my current use case this restriction is fine. But, it would be nice to have a cleaner, cross platform approach.

The fact that that /proc/<PID>/cmdline approach works gives me hope that python isn't execing before it runs the script (at least not the syscall exec, but maybe the exec builtin). I remember reading somewhere that all of this argument handling (ex. -m) is done in pure python, not C (this is confirmed by the fact that python -m this.does.not.exist will produce an exception that looks like it came from the runtime). So, I'd venture a guess that somewhere in pure python the original argv is available (perhaps this requires some spelunking through the runtime initialization?).

tl;dr Is there a cross platform (builtin, preferably) way to get at the original argv passed to python (before it remove the python executable and transforms -m blah into blah.py)?

edit From spelunking, I discovered Py_GetArgcArgv, which can be accessed via ctypes (found it here, links to several SO posts that mention this approach):

import ctypes

_argv = ctypes.POINTER(ctypes.c_wchar_p)()
_argc = ctypes.c_int()

ctypes.pythonapi.Py_GetArgcArgv(ctypes.byref(_argc),
                                ctypes.byref(_argv))

argv = _argv[:_argc.value]
print(argv)

Now this is OS-portable, but not python implementation portable (only works on cpython and ctypes is yucky if you don't need it). Also, peculiarly, I don't get the right output on Ubunutu 16.04 (python -m foo.bar gives me ['python', '-m', '-m']), but I may just be making a silly mistake (I get the same behavior on OSX). It would be great to have a fully portable solution (that doesn't dig into ctypes).

Bailey Parker
  • 15,599
  • 5
  • 53
  • 91
  • Related: https://stackoverflow.com/q/44862323/7051394 – Right leg Mar 22 '18 at 13:06
  • 1
    what about creating a wrapper C program for python which stores the arguments in a file, and passes the filename as env. variable to python to read from? (and also calls python, it's a wrapper). ugly but would work, and portable. – Jean-François Fabre Mar 22 '18 at 13:08
  • @Rightleg If I understand that question correctly, that's exactly what I'm not looking for. They seem to be interested in the *unexpanded* args. I don't mind the expansion (or anything else done by the shell). I only care about python not removing the leading `python` argument and replacing `-m stuff` with `stuff.py` (for example). – Bailey Parker Mar 22 '18 at 13:09
  • @Jean-FrançoisFabre Yeah either that or a bash script. That would work fine if there was only one script. Unfortunately, my use case is running unit tests (which can be done many ways: by `unittest discover`, `setup.py`, or individually with `-m tests.test_something`). It's wouldn't be ideal to have to create wrappers for every test file (and the other methods of launching tests). – Bailey Parker Mar 22 '18 at 13:11
  • 2
    I mean: call the wrapper "python" and put it in the path _before_ your original python (that you locate it from your wrapper by being second in the path). I admit this isn't optimal. A PEP could be opened to ask for such a feature like `sys.original_argv` or such. – Jean-François Fabre Mar 22 '18 at 13:21
  • @Jean-FrançoisFabre I see! That's clever. That's a decent cross platform (and cross implementation) workaround, although it then requires users to use that wrapper (the context here is an open source package, so it seems a little burdensome to require that). I'll dig into th PEP history to see if this has been discussed before. – Bailey Parker Mar 22 '18 at 13:24
  • at least you did get upvotes for your question. That's something. – Jean-François Fabre Mar 22 '18 at 13:25
  • 4
    May I ask what the rationale behind getting the original argv? What is your ultimate goals? Perhaps knowing that, people of SO might be able to help. – Hai Vu Mar 22 '18 at 13:25
  • 1
    @HaiVu rationale is in the context of randomized/nondeterministic tests (with some random seed chosen before all tests are run), I'd like to print out a helpful error message on test failures that's like: `Randomized test failed. Run this to reproduce: SEED=123 python -m however.tests.were.run.before`. The idea here is that you can directly copy and paste that command to re-run with the same seed. The random stuff is trivial, but I need to be able to get at the original argv to produce something that can be copied & pasted then ran. (rspec does something like this) – Bailey Parker Mar 22 '18 at 13:32
  • note that you'll have to adapt your command line: `SEED=123 python -m however.tests.were.run.before` isn't multiplatform :) it doesn't work on windoze – Jean-François Fabre Mar 22 '18 at 13:33
  • I agree with @Jean-FrançoisFabre. One alternative is to pass `--seed=123` as argument. If the caller does not specify the `--seed` flag, a default is generated. – Hai Vu Mar 22 '18 at 13:37
  • @Jean-FrançoisFabre Yep ;) For my case I'm targeting *nix-like so this is fine (but not all of them support `/proc//cmdline` so I need something more cross platform than that). For this subset of the problem (the argv), having a python-endorsed cross-platform/implementation approach would be swell! – Bailey Parker Mar 22 '18 at 13:37
  • @HaiVu Yeah that's a perfectly fine solution too. Either way, though, I need to get at the original argv! And it seems like there isn't a nice way to do that :( – Bailey Parker Mar 22 '18 at 13:39
  • You assume there is a command line which you could make use of. Which is not the case when you embed Python in another application as a library. Wanting it to work everywhere (all OS-s, all Python-s) is probably a bit too much. – tevemadar Mar 22 '18 at 13:40
  • @tevemadar I'm not exactly sure what you mean by this, but unless you're compiling against a python implementation and manually calling into (presumably undocumented APIs), at some point (in C) to get into python land you have to do `execv("path/to/python/interpreter", {"interpreter", /* ...args */})` (or `CreateProcess` in windows, for example). I'm interested in getting at the `argv` here. – Bailey Parker Mar 22 '18 at 13:44
  • @BaileyParker "unless you're compiling against a python implementation and manually calling into" - that is what a library looks like (think about shared objects, dll-s, whatever). "presumably undocumented APIs" - why would it be undocumented? https://docs.python.org/3/extending/embedding.html - it is completely official and documented. – tevemadar Mar 22 '18 at 13:46
  • @tevemadar Fair enough. So from a cursory reading of these docs, it seems like you can set `argv`. So, it would seem reasonable to require `original_argv` to be the argv of the program invoking python (maybe this is optional since that would be a breaking API change). In any case, this is pretty tangential to the original question and I don't see how it precludes (in the case of invoking via the interpreter) getting at the original argv in a portable way. – Bailey Parker Mar 22 '18 at 13:55

3 Answers3

5

Python 3.10 adds sys.orig_argv, which the docs describe as the arguments originally passed to the Python executable. If this isn't exactly what you're looking for, it may be helpful in this or similar cases.

There were a bunch of possibilities considered, including changing sys.argv, but this was, I think, wisely chosen as the most effective and non-disruptive option.

  • This doesn't quite work. If you look at the [comment](https://stackoverflow.com/questions/49429412/recovering-original-argv#comment85861526_49429412), the O.P. wants to recover the cmdline _with variables_ like `SEED=123 python ...` and even `sys.orig_argv` will not recover that part. – wim Oct 03 '22 at 16:44
  • @wim Thanks. I wasn't sure it would resolve the issue of the O.P., but thought this information might be helpful to others who find this question searching for answers. While "I am not interested in undoing the work of the shell" suggests recovering SEED=123 isn't of interest, the comment does indicate otherwise. I think the "helpful message" would have to prepend all environment variables affecting the result, as there is no guarantee that they were set on the command line during that run. –  Oct 03 '22 at 17:23
2

This seems XY problem and you are getting into the weeds in order to accommodate some existing complicated test setup (I've found the question behind the question in your comment). Further efforts would be better spent writing a sane test setup.

  1. Use a better test runner, not unittest.
  2. Create any initial state within the test setup, not in the external environment before entering the Python runtime.
  3. Use a plugin for the randomization and seed stuff, personally I use this one but there are others.

For example if you decide to go with pytest runner, all the test setup can be configured within a [tool.pytest.ini_options] section of the pyproject.toml file and/or with a fixture defined in conftest.py. Overriding the default test configuration can be done with environment variables and/or command line arguments, and neither of these approaches will get mucked around by the shell or during Python interpreter startup.

The manner in which to execute the test suite can and should be as simple as executing a single command:

pytest

And then your perceived problem of needing to recover the original sys.argv will go away.

wim
  • 338,267
  • 99
  • 616
  • 750
  • If you remove the XY problem from the question, you'll still have a valid problem of recovering the original argv which is needed for example to modify the LD_LIBRARY_PATH for example (c.f. https://stackoverflow.com/questions/23244418/set-ld-library-path-before-importing-in-python/61653306#comment109088355_49457468) – Antti Haapala -- Слава Україні May 08 '20 at 04:34
0

Your stated problem is:

  1. User called my app with environment variables and arguments.
  2. I want to display a "run like this" diagnostic that will exactly reproduce the results of the current run.

There are at least two solutions:

  1. Abandon the "reproduction" aspect, since the original bash calling command is lost to the portable python app, and instead go for "same effect".
  2. Use a wrapper to capture the original calling command, as suggested by Jean-François Fabre.

With (1) you would be willing to accept ['-m', 'foo'] becoming ['foo.py'], or even turning it into ['/some/dir/foo.py'] in case PYTHONPATH could cause trouble. Displaying ['a', 'b c'] as "a" "b c", or more concisely as a "b c", is straightforward. If environment variables like SEED are an important part of the command line interface then you'll need to iterate over envp and output them, as well. For true reproducibility, you might choose to convert input args to canonical form, compare with observed input args, and exec using the canonical form if they're not identical, so there's no way to execute the bulk of your code using "odd" syntax.

With (2) you would bury the app in some inconveniently named file, advertise the wrapper program far and wide, and enjoy the benefits of seeing args before they're munged.

J_H
  • 17,926
  • 4
  • 24
  • 44