Running a python project with different arguments on a cluster

Question

I have a python3 project with the following directory structure:

project/
    run.py
    package/
        a.py
        b.py
        constants.py

Modules a and b use various common variables/hyperparameters. I need to run multiple instances of the project on a cluster, with different hyperparameters. I submit jobs to the cluster which then schedules them.

I have tried the following :

1. I had a constants.py inside the package, which I modified before submitting every job. Let's say I want to run 5 different sets of hyperparameters. The problem with this approach is that the cluster takes a while to schedule my jobs, and when it finally does, all the jobs will use the last-modified parameters (i.e. for the 5th run) stored in constants.py and not the 5 different sets that I wanted.

2. Next, I used argparse in run.py, but couldn't pass the arguments to a and b inside the package, despite trying various approaches like the ones in this SO thread.

So the hack that I had to resort to was to use argparse in run.py, import the 'constants' in run.py, re-intialise them, and then import the constants wherever I need them in a and b. This way, I can write multiple sh scripts with various different command-line arguments for run.py, and schedule them all on the cluster.

I'm sure there should be a better (and more pythonic) way to do this. Suggestions? Thanks.

That SO thread you linked has no connection to `argparse` module but instead links to an issue of importing modules/packages. Are you having issues with `argparse` or with your project organization and imports? If so if you add `__init__.py` files in both folders you can just import either (or parts of them, like the argparse values) into the other without much problems. [Here's](https://docs.python-guide.org/writing/structure/) a good read on how to structure projects. — ljetibo, Dec 03 '18 at 23:13
Yes, using that SO thread I was trying to import the result of a`argparse` (`args` variable) into `a` and `b`, albeit unsuccessfully. There are no issues with using `argparse`, only with the imports. Thanks a lot for the structuring link, I'll check it out if there is some improvement I need. — anaik, Dec 03 '18 at 23:55
In a simple configuration, `run.py` imports `a` and `b`. It runs the parser to get a `args` namespace. Ideally `a` would have a function, possibly `main` that takes values, e.g. `a.main(args.foo, args.bar, ....)`, etc. — hpaulj, Dec 04 '18 at 00:37
Right, but I require passing/importing the entire `args` variable from the `main` in `run.py` to the inner `a` and `b`, so that I can pass to their individual functions whichever command-line arguments they require. Hope this adds more clarity to the question. — anaik, Dec 04 '18 at 22:26

ljetibo · Answer 1 · 2018-12-05T02:23:24.207

Since I'm not quite following here's at least a start of an MCVE. Project dir structure:

project/
    __init__.py
    run.py
    package/
        __init__.py
        a.py
        b.py
        constants.py

Starting from package directory (the inner one) I have:

__init__.py

from .a import ModelA
from .b import ModelB

a.py

from . import constants

class ModelA:
    def __init__(self, a, b):
        print("Instantiating Model A with {0} {1}".format(a, b))
        print("    Pi:{0} hbar{1}".format(constants.pi, constants.hbar))

b.py

from . import constants

class ModelB:
    def __init__(self, a, b):
        print("Instantiating Model B with {0} {1}".format(a, b))
        print("    Pi:{0} hbar{1}".format(constants.hbar, constants.pi))

constants.py

hbar = 1
pi = 3.14

Note that the init has content purely for convenience of importing project as a package and having ModelA and ModelB names available under it immediately. I coud have just as easily left an empty __init__.py file and then from project.package.a import ModelA.

Much like how it is done in project dir. The top dir (project) has an empty __init__.py file and a:

run.py

#!/usr/bin/evn python

import argparse
import package

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--param1', dest='param1')
parser.add_argument('--param2', dest='param2')

args = parser.parse_args()

package.ModelA(args.param1, args.param2)
package.ModelB(args.param2, args.param1)

Note that the shebang link might not be necessary and is situation-dependent when running on a cluster where environment management can play a role.

Running this from the terminal should get you

$:> python3 project/run.py --param1 10 --param2 100
Instantiating Model A with 10 100
    Pi:3.14 hbar1
Instantiating Model B with 100 10
    Pi:1 hbar3.14

Now take this and improve what you have or reconstruct what you're trying to do in simplified terms like these (hopefully breaking the example) and then post back which part isn't working and why.

Edit

Let me preface the solution with a statement that doing things like this is setting yourself up for a failure. What it looks to me is like you have a run.py file in which you want to parse the arguments sent in over the terminal and use them to set global state of your program execution. You should not do that, almost ever (the only exceptions I know of is that sometimes and only sometimes there is use in setting global state for an engine or a session when connecting to a database but even thenit's not usually the only or the best solution).
This is the very reason why you want modules and packages. There should be no reason why you would not be able to parse your input in run.py and **call** the functionality in any of your submodules. How many parameters the functions in a and b take or whether they take and use all or none of the sent parameters literally makes no difference. You could edit the above example so that the class A and B only need 1 or 3 or 10, or A 5 and B none parameters and it would still work.

a.py

from . import constants
from project.run import args

print("from A", args)

class ModelA:
    def __init__(self, a, b):
        print("Instantiating Model A with {0} {1}".format(a, b))
        print("    Pi:{0} hbar{1}".format(constants.pi, constants.hbar))

b.py

from . import constants
from project.run import args

print("from B", args)

class ModelB:
    def __init__(self, a, b):
        print("Instantiating Model B with {0} {1}".format(a, b))
        print("    Pi:{0} hbar{1}".format(constants.hbar, constants.pi))

run.py

#!/usr/bin/evn python
import argparse
from . import package

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--param1', dest='param1')
parser.add_argument('--param2', dest='param2')

args = parser.parse_args()

if __name__ == "__main__":
    package.ModelA(args.param1, args.param2)
    package.ModelB(args.param2, args.param1)

And then invoke it as

$:> python3 -m project.run --param1 10 --param2 100
from A Namespace(param1='10', param2='100')
from B Namespace(param1='10', param2='100')
Instantiating Model A with 10 100
    Pi:3.14 hbar1
Instantiating Model B with 100 10
    Pi:1 hbar3.14

Notice that not much has changed. Almost anything really, we imported args from run.py by using absolute import paths instead of relative ones and then we moved the execution code into if __name__ == "__main__": so that it would not get called on every import - only the one that makes that script the "main" program. The only bigger and important difference made was invocation of the script. The terminal command python3 -m project.run will import the module and then run it as a script whereas the previously used python3 project/run.py will just try and run the run.py as a script. When the module is first imported its __package__ value gets set; when __package__ is set it enables explicit relative imports, thus the from project.run import ... statements work because now python knows where it has to go to look for those values. Whereas when run.py is run just like a script, then when the import package statement gets invoked in the old run.py Python goes into the package/ directory but does not know it might need to go back up one level (to run.py) to search for values defined in there and import them back to a lower level in package/ dir.

Thank you for the MVCE. While it helps, what I'm looking for is more along the lines of how to pass/import the `args` variable itself from the outer `run.py` into the inner `a` and `b`. They have a bunch of functions which use various (but not all) variables that are being passed as command-line arguments. Any suggestions for doing that? — anaik, Dec 04 '18 at 22:21

score 0 · Answer 2 · answered Dec 05 '18 at 02:30

I would suggest using environment variables that can be specified per instance

Maybe don't have actual constants, or at least some config.py

import os 

my_val=os.environ.get('MY_VAL', 'default value')

Then when you run your code on multiple instances, you would need to export the appropriate variables between each execution.

If you were to containerize your application, then with Docker, you'd be able to pass in -e MY_VAL="some value" and it would be loaded within the code as that

Running a python project with different arguments on a cluster

2 Answers2

Edit