I need an algorithm that executes a file 'test.py' on different computers at the same time (with different parameters), and on each computer, vectorized methods, e.g. from the numpy package, should be able to use multiple cores.
A minimal (not working) example, consists of the following two files.
file A: test.py
import numpy
import os #to verify whether process is allowed to use all cores
os.system("taskset -p 0xff %d" % os.getpid())
a = numpy.random.random((10000,10000)) #some random matrix
b = numpy.dot(a,a) #some parallelized calculation
and file B: control.py
import subprocess
import os
callstring = "python " + str(os.getcwd()) + "/test.py" # console command to start test.py
sshProcess = subprocess.Popen(str("ssh <pc-name>"),
stdin=subprocess.PIPE,
shell=True)
sshProcess.stdin.write(str(callstring)) # start file.py
sshProcess.stdin.close()
Now, when I run control.py, the file test.py is beeing executed, however, only with a single core.
If I were to run test.py directly from the console python test.py
(which I dont want), multiple cores are used instead.
Unfortunately, I am new to the subprocess extension, and also I am no expert on Linux systems. However, I was able to gain the following knowledge so far:
- Using
subprocess.call("python test.py", shell=True)
would work, i.e. multiple cores are used then. However, I need the ssh to address other computers. - Using the console manually, i.e. going via ssh to a different computer and running
python test.py
also gives the desired result: multiple cores are used. However, I need to automate this step, and hence, I would like to create multiple of these 'ssh-consoles' with python code. - Core affinity seems not to be the problem (it's a typical numpy problem), as
os.system("taskset -p 0xff %d" % os.getpid())
produces the output 'current affinity ff, new affinity ff' = 8 possible cores
Therefore, it seems to be an issue of Popen in combination with ssh !?
Do you have any ideas or advices? Thanks for your help!
EDIT / ADDENDUM: I found out that parallelized methods of the package 'multiprocessing` DO run un multiple cores. So it seems to be a numpy problem again. I apologize for not having tested this before!
I am using Python 2.7.12 and Numpy 1.11.1