Python subprocess.Popen : sys.stdout vs .txt file vs Cpickle.dump

Question

I would like to know what is the best practice when you want to "return" something from a python script.

Here is my problem. I'm running a Python childScript from a parentScript using the subprocess.Popen method. I would like to get a tuple of two floats from the execution of the first script.

Now, the first method I have seen is by using sys.stdout and a pipe in the subprocess function as follow:

child.py:

if __name__ == '__main__':
   myTuple = (x,y)   
   sys.stdout.write(str(myTuple[0]) +":"+str(myTuple[1]))
   sys.stdout.flush()

parent.py:

 p = subprocess.Popen([python, "child.py"], stdout=subprocess.PIPE)
 out, err = p.communicate()

Though here it says that it is not recommended in most cases but I don't know why...

The second way would be to write my tuple into a text file in Script1.py and open it in Script2.py. But I guess writing and reading file takes a bit of time so I don't know if it is a better way to do?

Finally, I could use CPickle and dump my tuple and open it from script2.py. I guess that would be a bit faster than using a text file but would it be better than using sys.stdout?

What would be the proper way to do?

---------------------------------------EDIT------------------------------------------------

I forgot to mention that I cannot use import since parent.py actually generates child.py in a folder. Indeed I am doing some multiprocessing.

Parent.py creates say 10 directories where child.py is copied in each of them. Then I run each of the child.py from parent.py on several processors. And I want parent.py to gather the results "returned" by all the child.py. So parent.py cannot import child.py since it is not generated yet, or maybe I can do some sort of dynamic import? I don't know...

---------------------------------------EDIT2-----------------------------------------------

Another edit to answer a question with regards to why I proceed this way. Child.py actually calls ironpython and another script to run a .Net assembly. The reason why I HAVE to copy all the child.py files in specific folders is because this assembly generates a resource file which is then used by itself. If I don't copy child.py (and the assembly by the way) in each subfolders the resource files are copied at the root which creates conflicts when I call several processes using the multiprocessing module. If you have some suggestions about this overall architecture it is more than welcome :).

Thanks

unrelated: in general, `script1.py`, `script2.py`, `float1`, `float2` are not good names. You could use `child.py`, `parent.py` for the scripts if nothing more specific comes to mind and `numbers` or `x, y` for the numbers. — jfs, Oct 30 '14 at 14:24

jfs · Accepted Answer · 2014-10-30T15:38:32.747

2

Ordinary, you should use import other_module and call various functions:

import other_module

x, y = other_module.some_function(param='z')

If you can run the script, you also can import it.

If you want to use subprocess.Popen() then to pass a couple of floats, you could use json format: it is human readable, exact (in this case), and it is machine-readable. For example:

child.py:

#!/usr/bin/env python
import json
import sys

numbers = 1.2345, 1e-20
json.dump(numbers, sys.stdout)

parent.py:

#!/usr/bin/env python
import json
import sys
from subprocess import check_output

output = check_output([sys.executable, 'child.py'])
x, y = json.loads(output.decode())

Child.py actually calls ironpython and another script to run a .Net assembly. The reason why I HAVE to copy all the child.py files is because this assembly generates a resource file which is then used by it. If I don't copy child.py in each subfolders the resource files are copied at the root which creates conflicts when I call several processes using the multiprocessing module. If you have some suggestions about this overall architecture it is more than welcome :).

You can put the code from child.py into parent.py and call os.chdir() (after the fork) to execute each multiprocessing.Process in its own working directory or use cwd parameter (it sets the current working directory for the subprocess) if you run the assembly using subprocess module:

#!/usr/bin/env python
import os
import shutil
import tempfile
from multiprocessing import Pool

def init(topdir='.'):
    dir = tempfile.mkdtemp(dir=topdir) # parent is responsible for deleting it
    os.chdir(dir)

def child(n):
    return os.getcwd(), n*n

if __name__ == "__main__":
   pool = Pool(initializer=init)
   results = pool.map(child, [1,2,3])
   pool.close()
   pool.join()
   for dirname, _ in results:
       try:
           shutil.rmtree(dirname)
       except EnvironmentError:
           pass # ignore errors

edited Oct 30 '14 at 15:38

answered Oct 30 '14 at 14:13

jfs

399,953
195
994
1,670

I forgot to mention that I cannot use import since script2 actually generates script1 in a folder. Indeed I am doing some multiprocessing. Script2 creates say 10 directories where script1 is copied in each of them. Then I run each of the script1 from script2. And I want script2 to gather the results "returned" by all the script1. So script2 cannot import script1 since it is not generated yet, or maybe I can do some sort of dynamic import? I don't know. But I will have a look at json. Thanks! – Serge Oct 30 '14 at 14:30
1

@user2390615: 1. any additional info should go to your question instead of the comment so that others could easily find it. 2. If the script is not generated yet then you can't run it using subprocess too. :) If the file already exists then you can import it (move the top-level code into a function, to avoid executing it on import). 3. Generating scripts on-the-fly and copying them around does not sound like a good design -- you should look more closely at it. – jfs Oct 30 '14 at 14:45
1

@user2390615: I've added architecture suggestions. – jfs Oct 30 '14 at 14:59
OK so you would recommend to import on the fly the child.py files in my parent.py files once the folders are created and the child.py files copied? Would that be a better way to do? – Serge Oct 30 '14 at 15:01
1

Is it clear that you should (once) include the code from `child.py` into the `child()` function above i.e., it should be the only source/no copying etc? – jfs Oct 30 '14 at 15:13
Yes yes. Thank you very much. I am still figuring out your example. I didn't know about pool initializers yet. I believe this will do I just need some time to catch up with it. But yes, I agree. child.py should not be copied, only the .Net assembly should be copied in separate temporary folders. I'm working on it rigth now and come back in a few minutes! – Serge Oct 30 '14 at 15:21
@Serge: there could be permission problem with `tempfile.mkdtemp()` (it might be too strict -- it doesn't allow the parent to delete it). The idea is to generate unique directory name and change working directory to it. – jfs Oct 30 '14 at 15:25
1

@Serge: I've fixed the permission problem (it is due to [sticky bit on tmp directory](http://askubuntu.com/q/432699/3712), if you use anything else it works as is). I've updated the code. – jfs Oct 30 '14 at 15:32
Ok great! I haven't tried it yet but I could always create a folder with shutil as I did if there was any problem with tempfile. So yes I generate the folders and change working directory to each of them when I run my .Net assembly, and all is done frome the parent.py file. Thank you very much again! – Serge Oct 30 '14 at 15:35

Python subprocess.Popen : sys.stdout vs .txt file vs Cpickle.dump

1 Answers1