Using subprocess to automate repeated executions of computationally intensive program

Question

What I am trying to do

I am using a program called MESA (https://docs.mesastar.org/en/latest/index.html) and the relevant steps for each run are:

Edit a few lines with input parameters in a text file
Execute the (bash) shell command “./mk”
Execute the (bash) shell command “./rn”

After successful completion of rn these steps are repeated for each iteration.

My implementation

In order to automate these steps I came up with the following program:

import subprocess 

inputs[n][5] #2d array imported from csv

for i in range(len(inputs)):

    #read data
    with open('inlist', 'r', encoding='utf-8') as file:
            data = file.readlines()

    #lines to change
    data[ 73] = “   RSP_mass = ” + inputs[i][0] + “d0\n”
    data[ 74] = “   RSP_Teff = ” + inputs[i][1] + “d0\n”
    data[ 75] = “   RSP_L = ”+ inputs[i][2] + “d0\n”

    data[ 99] = “   log_directory = 'LOGS/” + inputs[i][3] + “'\n”
    data[100] = “   photo_directory = 'PHOTOS/” + inputs[i][4] + “'\n”

    #write data
    with open('inlist', 'r', encoding = 'utf-8') as file:
        file.writelines()

    #running MESA
    subprocess.run(“./mk”)
    subprocess.run(“./rn”, stdout = subprocess.PIPE)

Issue 1:

Since MESA is very computationally intensive (uses up all of the available 16 threads) and already takes up to 2 ½ - 3 hours per run, I am quite worried about possible performance issues. Due to the long run time per run, its also quite difficult to benchmark.

Is there a better solution available, that I have missed?

Issue 2: During a run MESA outputs a little less than 1000 lines to stdout, which I assume will cause quite a slow down if running via subprocess. The easiest way would be of course to just disable any output, however it is quite useful to be able to check the evolution process during runs, so I would like to keep it if possible. From this thread Python: Reading a subprocess' stdout without printing to a file, I have already learned that stdout=subprocess.PIPE would be the fastest way of doing so. The storing of the output data is already handled by MESA itself. Is this a good solution in regards to performance?

Issue 3: This is the least important of the issues, however it might affect the implementation of the prior issues, so I thought I would ask about it as well. Is it possible to define a custom keyboard interrupt, which doesn’t terminate the program immediately, but only once the next run has completed? Based on the thread How to generate keyboard events? I would assume the keyboard library would be best suited for Ubuntu.

Please focus on one question per post. I can try to answer the Python parts but I know nothing about MESA. — tripleee, May 31 '22 at 17:32
Also, your code seems to have invalid syntax such as "typographical" double quotes. Please [edit] to post exactly the code you are asking about, ideally as a [mre]. — tripleee, May 31 '22 at 17:33
Having your subprocesses write their output to a separate log file would allow Python to get out of the way completely. — tripleee, May 31 '22 at 17:35
I am sorry if i made it unclear by adding that the program is MESA, my question is completely independent of the program used. My main issue is whether or not subprocess is able to run a program which uses multiple cores without significant slow down. — Alpha_Ursae_Minoris, May 31 '22 at 19:01
Maybe I made it difficult to understand as well, but if I run MESA without the use of subprocess, it prints some of the output to stdout during the run (which I would also like to see when using subprocess, with minimal performance loss) and it stores the output itself to log files. That's why I need to change the log and photo directory data[99] and data[100] as shown in the code, which is why I included them in the reproducible example. — Alpha_Ursae_Minoris, May 31 '22 at 19:16
Thank you for the template file suggestion, that was what I was trying to achieve. Would placing the read data outside of the loop be enough to achieve this? If not, could you please point me towards an example of such an implementation? — Alpha_Ursae_Minoris, May 31 '22 at 19:20

tripleee · Accepted Answer · 2022-06-01T03:13:00.473

Repeatedly reading and rewriting the input file is clumsy and inefficient, and anyway, you can't write to it when you open it in read-only mode ('r').

I would instead read a template file, once, then write the actual configuration file based on that. (Python has a separate Template class in the standard library which would perhaps be worth looking into, but this is simple enough to write from scratch.)

A subprocess simply leaves Python completely out of the picture, so running your tasks from the shell should work the same as running them from Python.

If you have no reason to capture the output from the process, just let it spill onto the user's terminal directly. Not specifying anything for stdout= and stderr= in the subprocess call achieves that.

import subprocess 

# inputs[n][5] #2d array imported from csv

with open('template', 'r', encoding='utf-8') as file:
    data = file.readlines()

for inp in inputs:
    data[ 73] = f"   RSP_mass = {inp[0]}d0\n"
    data[ 74] = f"   RSP_Teff = {inp[1]}d0\n"
    data[ 75] = f"   RSP_L = {inp[2]}d0\n"

    data[ 99] = f"   log_directory = 'LOGS/{inp[3]}'\n"
    data[100] = f"   photo_directory = 'PHOTOS/{inp[4]}'\n"

    with open('inlist', 'w', encoding = 'utf-8') as file:
        file.writelines()

    subprocess.run("./mk", check=True)
    subprocess.run("./rn", check=True)

Notice how this now reads from a file called template, once outside the loop, and then writes ('w') to inlist repeatedly. I also fixed the loop to be a bit more idiomatic, and changed the curly double quotes to proper ASCII double quotes. The replacements now use f-strings for (IMHO) improved legibility. Down near the end, the check=True keyword argument to subprocess.run instructs Python to raise an error if the subprocess fails.

The keyboard interrupt idea sounds unnecessarily challenging. You can add a signal handler to selectively ignore some signals, but a much simpler solution would be to just check whether any regular key (or a specific one; say q) has been pressed within the loop. See e.g. How to detect key presses?

One quick follow up question: I didn't include `check = True` on purpose, since MESA even with successful completion will provide an error, as it terminates once it exceeds a certain range. Unfortunately this "error message" is very dependent on the input and therefore changes during each loop, so it couldn't just be hard coded to check for only the "successful error message". If i understood correctly check = True checks if there were no errors during execution and therefore I would need to omit it. Is this correct? — Alpha_Ursae_Minoris, Jun 01 '22 at 07:04
You might be confusing the status code with the existence of error messages; these are two separate things. For example, `grep` will exit with status code 1 when it could not find any matches, but not emit any error message. This is what `check=True` checks for. — tripleee, Jun 01 '22 at 07:06
If you need more granular checks, take out the `check=True` and examine whether the `subprocess` returned a specific `returncode`. For example, `p = subprocess.run(["grep", "foo"], input="bar", text=True); if p.returncode != 1: raise subprocess.CalledProcessError("panic!")` — tripleee, Jun 01 '22 at 07:09
For (much) more, see also https://stackoverflow.com/questions/4256107/running-bash-commands-in-python/51950538#51950538 — tripleee, Jun 01 '22 at 07:10

Using subprocess to automate repeated executions of computationally intensive program

1 Answers1