I'm trying to train a neural network in Python using PyBrain and Python's multiprocessing package.
Here is my code (it trains a simple neural network to learn the XOR logic).
import pybrain.tools.shortcuts as pybrain_tools
import pybrain.datasets
import pybrain.supervised.trainers.rprop as pybrain_rprop
import multiprocessing
import timeit
def init_XOR_dataset():
dataset = pybrain.datasets.SupervisedDataSet(2, 1)
dataset.addSample([0, 0], [0])
dataset.addSample([0, 1], [1])
dataset.addSample([1, 0], [1])
dataset.addSample([1, 1], [0])
return dataset
def standard_train():
net = pybrain_tools.buildNetwork(2, 2, 1)
net.randomize()
trainer = pybrain_rprop.RPropMinusTrainer(net, dataset=init_XOR_dataset())
trainer.trainEpochs(50)
def multithreaded_train(threads=8):
nets = []
trainers = []
processes = []
data = init_XOR_dataset()
for n in range(threads):
nets.append(pybrain_tools.buildNetwork(2, 2, 1))
nets[n].randomize()
trainers.append(pybrain_rprop.RPropMinusTrainer(nets[n], dataset=data))
processes.append(multiprocessing.Process(target=trainers[n].trainEpochs(50)))
processes[n].start()
# Wait for all processes to finish
for p in processes:
p.join()
if __name__ == '__main__':
threads = 4
iterations = 16
t1 = timeit.timeit("standard_train()",
setup="from __main__ import standard_train",
number=iterations)
tn = timeit.timeit("multithreaded_train({})".format(threads),
setup="from __main__ import multithreaded_train",
number=iterations)
print "Execution time for single threaded training: {} seconds.".format(t1)
print "Execution time for multi threaded training: {} seconds.".format(tn)
In my code, there are two functions: one running single threaded and one (supposedly) running multithreaded using the multiprocessing package.
As far as I can judge, my multiprocessing code is sound. But when I run it, the multiprocessing code doesn't run on more than one core. I verified this by checking the run time (with threads = 4 and 4 cores it takes 4 times as long, while it should take approximately as long as a single threaded run). I double checked it by looking at htop
/atop
.
I know about the Global Interpreter Lock (GIL), but the multiprocessing package is supposed to handle this.
I also know about the issue that scipy causes the cpu affinity to be set in such a way that only one core is used. However, if I print the process affinity just after scipy is imported in the PyBrain package (print psutil.Process(os.getpid()).cpu_affinity()
), I can see that the affinity is ok:
$ python ./XOR_PyBrain.py
[0, 1, 2, 3]
Execution time for single threaded training: 14.2865240574 seconds.
Execution time for multi threaded training: 46.0955679417 seconds.
I observe this behaviour on my Debian NAS as well on my Debian Desktop as well as on my Mac.
Version info for the Debian NAS:
- CPU: Intel(R) Atom(TM) CPU D2701 @ 2.13GHz
- Debian 8.4. Kernel: 3.2.68-1+deb7u1 x86_64
- Python 2.7.9
- PyBrain 0.3
- Scipy 0.14.0-2
So, my question is: how do I let PyBrain train on multiple cores?