Loading environment modules on a cluster from my python script

Question

I need to submit a python job to a server. While it is running I need to load and unload modules while it runs since it calls several programs, each with different dependencies that conflict i.e. gcc versus intel.

This question has been asked before but the answers have not worked for me in this situation

Loading environment modules within a python script

loading-environment-modules-within-a-python-script

I have tried using the following

import subprocess as sub
cmd = 'module load intel/2016.4'
p = sub.Popen(cmd, shell=True, stderr = sub.STDOUT, stdout = sub.PIPE).communicate()[0] 
print(p.decode()) # this simply outputs to screen

And, the output says that modules have been switched.

Lmod is automatically replacing "gcc/5.4.0" with "intel/2016.4".

Due to MODULEPATH changes, the following have been reloaded:
  1) openmpi/2.1.1

However, when I do 'module list' from the terminal, the modules have not been switched. gcc/5.4.0 is still loaded. Also the program requiring intel/2016.4 fails to run. For instance later I want to be able to use a version of gromacs that requires intel/2016.4 and it fails.

I am a little confused since I thought I was able to use bash commands via Popen and 'module load' is a bash command. I don't want to have to write a bash script to do this, there is alot of other things in my script done much more conveniently with python than bash.

When you perform the `Popen` call, Python spawns a new subshell, runs the module command inside that subshell and finally closes that subshell, losing the changes the you just made. This is why you cannot see the changes once finished. And this is why a new command works if started in the same subshell. — Poshi, Jan 11 '19 at 13:17

Charlie Crown · Accepted Answer · 2019-01-10T01:31:50.457

3

I recently ran into this. A simple way around this would be to include the dependencies before the command you want, and separate them with a semi-colon

cmd = 'module load intel/2016.4; "gromacs command"'
p = sub.Popen(cmd, shell=True, stderr = sub.STDOUT, stdout = sub.PIPE).communicate()[0]

where "gromacs command" represents however you would normally have called gromacs. intel/2016.4 won't show up in your module list as having loaded, if you check after running the script but gromacs will run from inside your python script using intel/2016.4, which is what you want.

edited Jan 10 '19 at 01:31

answered Jan 10 '19 at 01:10

Charlie Crown

1,071
2
11
28

So if a given module is getting called multiple times, does it need to be loaded in-command every time? – semblable Mar 26 '21 at 16:34

score 2 · Answer 2 · answered Jan 16 '19 at 08:11

Most environment module implementations have a Python init script that comes very handy. For lmod, it is in $LMOD_DIR/../init and it is named env_modules_python.py. So you can do this:

$ export PYTHONPATH=${PYTHONPATH}:$LMOD_DIR/../init
$ python
Python 2.7.5 (default, Jul 13 2018, 13:06:57)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from env_modules_python import module

and from there you can run any 'module' command directly in Python.

>>> module('list')

Currently Loaded Modules:

[...]
  3) StdEnv                                             (H)
  4) GCCcore/6.4.0                                      (H)
  5) binutils/2.28-GCCcore-6.4.0                        (H)
[...]

It will modify the Python script's environment, and that environment will be propagated to subshells.

>>> import os
>>> os.system("which icc")
which: no icc in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin)
256
>>> module("load intel")
>>> os.system("which icc")
/opt/[...]/icc/2017.4.196-GCC-6.4.0-2.28/compilers_and_libraries_2017.4.196/linux/bin/intel64/icc
0

It works the same with Popen:

>>> import subprocess as sub
>>> cmd='which icc'
>>> p = sub.Popen(cmd, shell=True, stderr = sub.STDOUT, stdout = sub.PIPE).communicate()[0]
>>> print(p.decode())
/opt/[...]icc/2017.4.196-GCC-6.4.0-2.28/compilers_and_libraries_2017.4.196/linux/bin/intel64/icc

Loading environment modules on a cluster from my python script

2 Answers2