7

(This question was asked here, but the answer was Linux-specific; I'm running on FreeBSD and NetBSD systems which (EDIT: ordinarily) do not have /proc.)

Python seems to dumb down argv[0], so you don't get what was passed in to the process, as a C program would. To be fair, sh and bash and Perl are no better. Is there any way I can work around this, so my Python programs can get that original value? I have administrative privileges on this FreeBSD system, and can do things like changing everyone's default PATH environment variable to point to some other directory before the one that contains python2 and python3, but I don't have control over creating /proc. I have a script which illustrates the problem. First, the script's output:

the C child program gets it right: arbitrary-arg0 arbitrary-arg1
the python2 program dumbs it down: ['./something2.py', 'arbitrary-arg1']
the python3 program dumbs it down: ['./something3.py', 'arbitrary-arg1']
the sh script       dumbs it down: ./shscript.sh arbitrary-arg1
the bash script     dumbs it down: ./bashscript.sh arbitrary-arg1
the perl script drops arg0:        ./something.pl arbitrary-arg1

... and now the script:

#!/bin/sh

set -e
rm -rf work
mkdir work
cd work
cat > childc.c << EOD; cc childc.c -o childc
#include <stdio.h>
int main(int    argc,
         char **argv
        )
{
  printf("the C child program gets it right: ");
  printf("%s %s\n",argv[0],argv[1]);
}
EOD
cat > something2.py <<EOD; chmod 700 something2.py
#!/usr/bin/env python2
import sys
print "the python2 program dumbs it down:", sys.argv
EOD
cat > something3.py <<EOD; chmod 700 something3.py
#!/usr/bin/env python3
import sys
print("the python3 program dumbs it down:", sys.argv)
EOD
cat > shscript.sh <<EOD; chmod 700 shscript.sh
#!/bin/sh
echo "the sh script       dumbs it down:" \$0 \$1
EOD
cat > bashscript.sh <<EOD; chmod 700 bashscript.sh
#!/bin/sh
echo "the bash script     dumbs it down:" \$0 \$1
EOD
cat > something.pl <<EOD; chmod 700 something.pl
#!/usr/bin/env perl
print("the perl script drops arg0:        \$0 \$ARGV[0]\n")
EOD
cat > launch.c << EOD; cc launch.c -o launch; launch
#include <sys/types.h>
#include <sys/wait.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

int main(int    argc,
         char **argv,
         char **arge)
{
  int    child_status;
  size_t program_index;
  pid_t  child_pid;

  char  *program_list[]={"./childc",
                         "./something2.py",
                         "./something3.py",
                         "./shscript.sh",
                         "./bashscript.sh",
                         "./something.pl",
                         NULL
                        };

  char  *some_args[]={"arbitrary-arg0","arbitrary-arg1",NULL};

  for(program_index=0;
      program_list[program_index];
      program_index++
     )
  {
    child_pid=fork();

    if(child_pid<0)
    {
      perror("fork()");
      exit(1);
    }
    if(child_pid==0)
    {
      execve(program_list[program_index],some_args,arge);
      perror("execve");
      exit(1);
    }
    wait(&child_status);
  }

  return 0;
}
EOD
Bill Evans at Mariposa
  • 3,590
  • 1
  • 18
  • 22
  • 1
    Just curious, what is the use case of it? – Pavel Shishmarev Dec 12 '19 at 05:28
  • (a) The other stackoverflow item which I've linked at the top of my entry, itself points to a non-stackoverflow item which raises the same question, and alleges that CUPS actually uses argv[0] to contain a URL! (b) It's complicated, but I'll have a directory tree with identically named python programs sitting in each, and each will want to know EXACTLY how the user got there. – Bill Evans at Mariposa Dec 12 '19 at 05:40
  • 1
    This has less to do with the scripting languages being badly behaved and more to do with the shebang mechanism and how it works. Some details are [here](https://www.in-ulm.de/~mascheck/various/shebang/). Make note of the table towards the bottom... very different behavior across OSes in many cases. The CUPS situation stinks... can you look at the `DEVICE_URI` environment variable instead? – John Szakmeister Dec 14 '19 at 11:35
  • @JohnSzakmeister: Thanks for the link, most edifying. And yes, it's a shebang thing (at least); the demo script in my answer shows how this can be overcome with a few scripting languages, and maybe others could be added to the list. The CUPS situation is ugly (which, I guess, makes my original question kinda ugly). But since C programs can handle an arbitrary argv[0], there's no reason not to extend that to the shebang situation as well. I neither maintain nor even use CUPS, but thought the URL use was ... droll. – Bill Evans at Mariposa Dec 14 '19 at 18:50

3 Answers3

2

What follows is a generally useful answer to what I meant to ask.

The answer that kabanus gave is excellent, given the way I phrased the problem, so of course he gets the up-arrow and the checkmark. The transparency is a beautiful plus, in my opinion.

But it turns out that I didn't specify the situation completely. Each python script starts with a shebang, and the shebang feature makes it more complicated to launch a python script with an artificial argv[0].

Also, transparency isn't my goal; backward compatibility is. I would like the normal situation to be that sys.argv works as shipped, right out of the box, without my modifications. Also, I would like any program which launches a python script with an artificial argv[0] not to have to worry about any additional argument manipulation.

Part of the problem is to overcome the "shebang changing argv" problem.

The answer is to write a wrapper in C for each script, and the launching program launches that program instead of the actual script. The actual script looks at the arguments to the parent process (the wrapper).

The cool thing is that this can work for script types other than python. You can download a proof of concept here which demonstrates the solution for python2, python3, sh, bash, and perl. You'll have to change each CRLF to LF, using dos2unix or fromdos. This is how the python3 script handles it:

def get_arg0():
    return subprocess.run("ps -p %s -o 'args='" % os.getppid(),
                          shell=True,
                          stdout=subprocess.PIPE,
                          stderr=subprocess.PIPE
                         ).stdout.decode(encoding='latin1').split(sep=" ")[0]

The solution does not rely on /proc, so it works on FreeBSD as well as Linux.

Bill Evans at Mariposa
  • 3,590
  • 1
  • 18
  • 22
1

What I think is the path of least resistance here is a bit hacky, but would probably work on any OS. Basically you double wrap your Python calls. First (using Python 3 as an example), the Python3 in your path is replaced by a small C program, which you know you can trust:

#include<stdlib.h>
#include<string.h>
int main(int argc, char **argv) {
    // The python 3 below should be replaced by the path to the original one
    // In my tests I named this program wrap_python so there was no problem
    // but if you are changing this system wide (and calling the wrapper python3
    //  you can't leave this.
    const char *const program = "python3 wrap_python.py";
    size_t size = strlen(program) + 1; // Already added null character at end
    for(int count = 0; count < argc; ++count)
        size += strlen(argv[count]) + 1; // + 1 for space

    char *cmd = malloc(size);
    if(!cmd) exit(-1);
    cmd[0] = '\0';
    strcat(cmd, program);
    for(int count = 1; count < argc; ++count) {
        strcat(cmd, " ");
        strcat(cmd, argv[count]);
    }
    strcat(cmd, " ");
    strcat(cmd, argv[0]);
    return system(cmd);
}

You can make this faster, but hey, premature optimization?

Note we are calling a script called wrap_python.py (probably you would need a full path here). We want to pass the "true" argv, but we need to work some in the Python context to make it transparent. The true argv[0] is passed as a last argument, and wrap_python.py is:

from sys import argv
argv[0] = argv.pop(-1)
print("Passing:", argv) # Delete me
exit(exec(open(argv[1]).read())) # Different in Python 2. Close the file handle if you're pedantic.

Our small wrapper replaces argv[0] with the one provided by our C wrapper removing it from the end, and then manually executes in the same context. Specifically __name__ == __main__ is true.

This would be run as

python3 my_python_script arg1 arg2 etc...

where your path now will point to that original C program. Testing this on

import sys
print(__name__)
print("Got", sys.argv)

yields

__main__
Got ['./wrap_python', 'test.py', 'hello', 'world', 'this', '1', '2', 'sad']

Note I called my program wrap_python - you want to name it python3.

kabanus
  • 24,623
  • 6
  • 41
  • 74
  • Given the way I worded the problem, your solution is excellent, and adds transparency into the bargain. It turns out that I didn't describe the situation completely. I've posted a different answer which addresses that situation; perhaps someone can make use of it. – Bill Evans at Mariposa Dec 13 '19 at 02:57
  • @BillEvansatMariposa Thanks, I see now what you meant and your answer is great. Good point about the shebang as well (for special handling), I'll have to think about whether that's detectable, since now I'm interested! – kabanus Dec 13 '19 at 04:57
  • The presence of the shebang is detectable: with open(sys.argv[0]) as phyle: xxx=phyle.readline() will do it. I have not been able to figure out how a program can detect whether, if it starts with a shebang, the command line was simply the program name, or whether it was "python3" (or variants) followed by the program name. Examining the "ps" line for that process or its parent doesn't seem to help. – Bill Evans at Mariposa Dec 13 '19 at 09:35
0

Use Python's ctypes module to get the "program name" which by default is set to argv[0]. See Python source code here. For example:

import ctypes

GetProgramName = ctypes.pythonapi.Py_GetProgramName
GetProgramName.restype = ctypes.c_wchar_p

def main():
    print(GetProgramName())

if __name__ == '__main__':
    main()

Running the command prints:

$ exec -a hello python3 name.py 
hello
vz0
  • 32,345
  • 7
  • 44
  • 77
  • This gives the program name, you're correct. I was seeking, though, access to an arbitrary sys.argv[0], which might not be anything like the program name. Just for kicks and grins, I put your statements (with a "print" function call around the final statement) in a script and tested it. The arbitrary sys.argv[0] did not shine through. – Bill Evans at Mariposa Dec 14 '19 at 19:03
  • 1
    @BillEvansatMariposa by default the program name in python is set to argv[0]. – vz0 Dec 14 '19 at 19:18
  • "is set to" could mean either that the program name is copied into argv[0], or that the program name is set to be whatever's in argv[0]; in other words, two opposite meanings. It's the case that the first of those two is correct. The quest here is to overcome the default, and let the python script have access to whatever may have been arbitrarily placed in argv[0] in the calling process's call to execve(2). And, as you can see, my answer fulfills that quest. – Bill Evans at Mariposa Dec 15 '19 at 02:28
  • Your example educated me, thanks! But it doesn't work with a shebang #!/usr/bin/env python3 at the beginning. – Bill Evans at Mariposa Dec 15 '19 at 09:40
  • Correction: it doesn't work with a shebang #!/usr/bin/env python3 at the beginning if you run that script directly, without having "python3" on the command line. – Bill Evans at Mariposa Dec 15 '19 at 10:34