Python multithreading multiple bash subprocesses. BUT with setting an environmental variable?

Question

Once again programming dos batch commands with Python

There's a great answer to Python threading multiple bash subprocesses?

I was pretty sure this would solve my problem, but I've tried both the Popen and the multiprocessing methods and neither works.

Here's my problem: I want to set a unique value for a Windows environmental variable (like TMP) that is used in each process. So process 1 will write to folder 1 and process 2 will write to folder 2 - and in terms of environmental variables, process 1 won't see what process 2 sees.

Here's my code based on the answer above. Variant 1 uses Windows set VAR=abc method. Variant 2 uses Python's os.environ['TMP']=abc method, and shouldn't work because TMP is accessed by python before it is set by dos. Neither 1 nor 2 works. Variant 3 is a sanity check, and works, but doesn't solve my problem

from subprocess import Popen
import os

commands = [
   # variant 1: does not work
    'set TMP=C:\\Temp\\1 &&  echo abc1 > %TMP%\\1.log', 'set TMP=C:\\Temp\\2 &&  echo abc2 > %TMP%\\2.log' 

   # variant 2: does not work
   # 'set TMP=C:\\Temp\\1 &&  echo abc1 > '+os.environ['TMP']+'\\1.log', 'set TMP=C:\\Temp\\2 &&  echo abc2 > '+os.environ['TMP']+'\\2.log'

   # variant 3: works, but does not set TMP environmental variable
   # 'echo abc1 > C:\\Temp\\1\\1.log', 'echo abc2 > C:\\Temp\\2\\2.log'
]

# run in parallel
processes = [Popen(cmd, shell=True) for cmd in commands]
# do other things here..
# wait for completion
for p in processes: p.wait()

Here's my code for the multiprocessing method (python variable commands is defined in the above script):

from functools import partial
from multiprocessing.dummy import Pool
from subprocess import call
import os

pool = Pool(2) # two concurrent commands at a time
for i, returncode in enumerate(pool.imap(partial(call, shell=True), commands)):
    if returncode != 0:
       print("%d command failed: %d" % (i, returncode))

(I've also tried set TMP=\"C:\\Temp\\1\" with double quotes around the folder)

((Python 2.7.13 64bit on Windows 10))

and not related to the answer I reference, I've tried this function:

import os
from subprocess import check_output

def make_tmp(tmp_path):

    os.environ['TMP'] = tmp_path
    dos = 'echo '+tmp_path+' > '+os.environ['TMP']+'\\output.log'
    check_output(dos, shell=True)


from multiprocessing.dummy import Pool as ThreadPool 
pool = ThreadPool(2) 

print os.environ['TMP']
path_array = ['\"C:\\Temp\\1\"', '\"C:\\Temp\\2\"']

With the following Traceback:

Der Prozess kann nicht auf die Datei zugreifen, da sie von einem anderen Prozess verwendet wird.*
Traceback (most recent call last):
  File "C:\Users\project11\Dropbox\project11\code_current\dev_folder\google_sheets\test_windows_variables.py", line 32, in <module>
    results = pool.map(make_tmp, path_array)
  File "C:\ProgramData\Anaconda2\lib\multiprocessing\pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "C:\ProgramData\Anaconda2\lib\multiprocessing\pool.py", line 567, in get
    raise self._value
subprocess.CalledProcessError: Command 'echo "C:\Temp\1" > "C:\Temp\2"\output.log' returned non-zero exit status 1

* The process can not access the file because it is being used by another process.

Also tried another answer to the same question, no luck.

You want to update your environment before making your call, and then use the updated environment in `Popen()` as seen in https://stackoverflow.com/questions/2231227/python-subprocess-popen-with-a-modified-environment — JohanL, Sep 11 '17 at 18:29
thanks. I want to modify the os.environ['TMP'] in the process - so I think the answer you reference won't work — philshem, Sep 11 '17 at 18:31
OK, why do you want to do that? To use the updated value in your original program? Because that cannot be done by the use of a subprocess. — JohanL, Sep 11 '17 at 18:32
The full python script wraps around a Windows executable that writes a temp file `data.bin` to the TMP folder. I want to run multiple processes of this Windows executable, but they all reference the same TMP folder and `data.bin` file. That's why I want to update TMP for each process. — philshem, Sep 11 '17 at 18:34
I still don't see why you cannot update the environment in your main script and then pass it to the different subprocess calls. You can give a separate copy to each of them, with different values of `TMP`. — JohanL, Sep 11 '17 at 18:36
I want the processes/sessions to run in parallel. It would be great if provide an example? Thanks! — philshem, Sep 11 '17 at 18:42
`Popen` takes an `env` argument that lets you specify the environment to use; don't use `set` in the command itself. — chepner, Sep 11 '17 at 19:03
@chepner this works, do you want to add it as an answer so I can accept? Had to add PIPE and shell=True from here https://stackoverflow.com/a/36249753/2327328 — philshem, Sep 11 '17 at 20:12
@philshem Well, that is what I tried to make you do as well, if you had looked more carefully at the answer I linked to. ;-) — JohanL, Sep 11 '17 at 20:19

philshem · Accepted Answer · 2017-09-12T08:57:29.053

1

Here's the working code, thanks to Chepner AND JohanL comments

import subprocess, os

def make_tmp(tmp_path):

    my_env = os.environ.copy()
    my_env['TMP'] = tmp_path
    dos = 'echo '+tmp_path[-1]+' > '+my_env['TMP']+'\\output.log'
    # or run a bat script
    # dos = 'C:\\Temp\\launch.bat'
    subprocess.Popen(dos, env=my_env, stdout=subprocess.PIPE, shell=True)

from multiprocessing.dummy import Pool as ThreadPool 
pool = ThreadPool(2) 

path_array = ['C:\\Temp\\1', 'C:\\Temp\\2', 'C:\\Temp\\3']

results = pool.map(make_tmp, path_array)

note that the path_array has to be existing folders

edited Sep 12 '17 at 08:57

answered Sep 11 '17 at 20:18

philshem

24,761
8
61
127

This should not require `subprocess.PIPE` If your use case is only to redirect the output of your command to a certain file you do not need to use a specific environment at all, though. That is better done in Python itself (which, then, would require `subprocess.PIPE`but not `shell=True`). – JohanL Sep 12 '17 at 03:31

score 1 · Answer 2 · answered Sep 12 '17 at 03:42

If your use case is only to capture the output to a file, you can let Python do that for you, by redirecting the command output to a subprocess.PIPE and then store the data from Python. This has the advantage that you do not need to use shell=True and thereby save yourself a possible vulnerability and an extra process creation. That could be written as:

import subprocess, os

def make_tmp(tmp_path):
    dos = ['echo', tmp_path[-1]]
    outfile = os.path.join(tmp_path, 'output.log')
    cmd_proc = subprocess.Popen(dos, stdout=subprocess.PIPE)
    with open(outfile, 'w') as f:
        f.write(cmd_proc.communciate[0])

from multiprocessing.dummy import Pool as ThreadPool 
pool = ThreadPool(2) 

path_array = ['C:\\Temp\\1', 'C:\\Temp\\2', 'C:\\Temp\\3']

results = pool.map(make_tmp, path_array)

Caveat: I have not had the time to test above code, so there could be some minor error.

Thanks for the answer. Actually the echo is a test case, and full python script wraps around a Windows executable that writes a temp file data.bin to the TMP folder. I want to run multiple processes of this Windows executable, but they all reference the same TMP folder and data.bin file. That's why I want to update TMP for each process. — philshem, Sep 12 '17 at 06:47

Python multithreading multiple bash subprocesses. BUT with setting an environmental variable?

2 Answers2