3

I have a basic sourcing function:

def source(
    fileName = None,
    update   = True
    ):
    pipe = subprocess.Popen(". {fileName}; env".format(
        fileName = fileName
    ), stdout = subprocess.PIPE, shell = True)
    data = pipe.communicate()[0]
    env = dict((line.split("=", 1) for line in data.splitlines()))
    if update is True:
        os.environ.update(env)
    return(env)

When I try to use it to source a particular script, I get the following error:

>>> source("/afs/cern.ch/sw/lcg/contrib/gcc/4.8/x86_64-slc6/setup.sh")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 9, in source
ValueError: dictionary update sequence element #51 has length 1; 2 is required

This arises from the following lines returned by the executable env:

BASH_FUNC_module()=() {  eval `/usr/bin/modulecmd bash $*`
}

The closing chain bracket is on line 51.

How should one source a Bash script from within Python in a robust, sensible way such that errors like this (and any other likely ones you can think of) are avoided?

Mogsdad
  • 44,709
  • 21
  • 151
  • 275
d3pd
  • 7,935
  • 24
  • 76
  • 127
  • Why are you sourcing a shell script in python like this? What are you trying to do? Make shell variables into python variables? – Etan Reisner Feb 12 '15 at 00:05
  • What do you expect to happen when these lines are encountered? What you're running into is Bash shell script code, not just environment variables. –  Feb 12 '15 at 00:05
  • [Etan Reisner](http://stackoverflow.com/users/258523/etan-reisner) I'm actually trying to run a Python script that sets up a certain environment that then makes Python modules (that are actually bound to a larger infrastructure) available for import. [duskwuff](http://stackoverflow.com/users/149341/duskwuff) I'm trying to make the environment that would be created by sourcing the shell script available in the Python environment that has run the source procedure. In what way should I be doing this? The basic approach I have currently is not reliable enough at all. – d3pd Feb 12 '15 at 00:14
  • But isn't a "python module" simply a folder with python files in it? (with at least one file called `__init__.py`). I will try to answer your question, but your stated goal doesn't make sense to me. I don't know how a python script creates a python module. Unless you are dynamically making files in the filesystem in the script – Alexander Bird Feb 12 '15 at 00:32
  • Also, (and someone can correct me if I'm wrong), you simply _can't_ change the python process's ENV vars by creating a subprocess. When the subprocess will sources the bash script, _only the subprocess_ has its ENV vars changed. Thus, after it exits, your script's process will have no changes. – Alexander Bird Feb 12 '15 at 00:35
  • Thanks for your help on this. The setup procedure makes available an infrastructure which Python can interact with. It's not a simple matter of determining an appropriate path for a module: in the setup procedure, executables and many other things make available Python bindings for C++ libraries via a particular module. What I'm trying to do is have a Python script which runs the setup procedure for the infrastructure and then imports a module that becomes available and functional following the setup procedure. – d3pd Feb 12 '15 at 00:39
  • There's more complexity to this, but it may help to know that I'm interacting with [PyROOT](https://root.cern.ch/drupal/content/pyroot). What I was illustrating in my very basic attempt was the setup of an environment in a subprocess and the extraction of the characteristics of that environment for application to the superprocess environment. – d3pd Feb 12 '15 at 00:43

2 Answers2

1

The line you are seeing is the result of the script doing the following:

module() { eval `/usr/bin/modulecmd bash $*`; }
export -f module

That is, it is explicitly exporting the bash function module so that sub(bash)shells can use it.

We can tell from the format of the environment variable that you upgraded your bash in the middle of the shellshock patches. I don't think there is a current patch which would generate BASH_FUNC_module()= instead of BASH_FUNC_module%%()=, but iirc there was such a patch distributed during the flurry of fixes. You might want to upgrade your bash again now that things have settled down. (If that was a cut-and-paste error, ignore this paragraph.)

And we can also tell that /bin/sh on your system is bash, assuming that the module function was introduced by sourcing the shell script.

Probably you should decide whether you care about exported bash functions. Do you want to export module into the environment you are creating, or just ignore it? The solution below just returns what it finds in the environment, so it will include module.

In short, if you're going to parse the output of some shell command which tries to print the environment, you're going to have three possible issues:

  1. Exported functions (bash only), which look different pre- and post-shellshock patch, but always contain at least one newline. (Their value always starts with () { so they are easy to identify. Post shellshock, their names will be BASH_FUNC_funcname%% but until you don't find both pre- and post-patched bashes in the wild, you might not want to rely on that.)

  2. Exported variables which contain a newline.

  3. In some case, exported variables with no value at all. These actually have the value of an empty string, but it is possible for them to be in the environment list without an = sign, and some utilities will print them out without an =.

As always, the most robust (and possibly even simplest) solution would be to avoid parsing, but we can fall back on the strategy of parsing a formatted string we create ourselves, which is carefully designed to be parsed.

We can use any programming language with access to the environment to produce this output; for simplicity, we can use python itself. We'll output the environment variables in a very simple format: the variable name (which must be alphanumeric), followed by an equal sign, followed by the value, followed by a NUL (0) byte (which cannot appear in the value). Something like the following:

from subprocess import Popen, PIPE

# The commented-out line really should not be necessary; it's impossible
# for an environment variable name to contain an =. However, it could
# be replaced with a more stringent check.
prog = ( r'''from os import environ;'''
       + r'''from sys import stdout;'''
       + r'''stdout.write("\0".join("{k}={v}".format(kv)'''
       + r'''                       for kv in environ.iteritems()'''
      #+ r'''                       if "=" not in kv[0]'''
       + r'''            ))'''
       )

# Lots of error checking omitted.    
def getenv_after_sourcing(fn):
  argv = [ "bash"
         , "-c"
         , '''. "{fn}"; python -c '{prog}' '''.format(fn=fn, prog=prog)]
  data = Popen(argv, stdout=PIPE).communicate()[0]
  return dict(kv.split('=', 1) for kv in data.split('\0'))
rici
  • 234,347
  • 28
  • 237
  • 341
-1

I think it is generally better to use bash directly to set the environment and then invoke the python script in the already set environment. This is taking advantage of one of the core unix/linux principles: a child process inherits a copy of the environment of the parent process.

If I understood your situation correctly then you have some bash scripts which set some environment which you want to have in your python scripts. Those python scripts then use that prepared environment to set some more environment for some more tools.

I suggest following setup:

  1. a bash wrapper

    • set the environment using bash scripts
    • invoke your python setup script (the python script inherits the environment from the bash script)
  2. your current python scripts sans the subprocess and environment reading

    • starts in environment prepared by bash script above
    • continue work to prepare environment for next tools

This way you can use each scripts in their "native environment".

An alternative would be to translate the bash scripts to python manually.

Lesmana
  • 25,663
  • 9
  • 82
  • 87
  • This may or may not possible in the general case. In my case, I need to load modules from python, depending on some programmatically defined cases, so your approach will not work for that – Davide Aug 25 '16 at 15:47