Python: multithreaded learning neural networks using PyBrain and Multiprocessing

Question

I'm trying to train a neural network in Python using PyBrain and Python's multiprocessing package.

Here is my code (it trains a simple neural network to learn the XOR logic).

import pybrain.tools.shortcuts as pybrain_tools
import pybrain.datasets
import pybrain.supervised.trainers.rprop as pybrain_rprop
import multiprocessing
import timeit


def init_XOR_dataset():
    dataset = pybrain.datasets.SupervisedDataSet(2, 1)
    dataset.addSample([0, 0], [0])
    dataset.addSample([0, 1], [1])
    dataset.addSample([1, 0], [1])
    dataset.addSample([1, 1], [0])
    return dataset


def standard_train():
    net = pybrain_tools.buildNetwork(2, 2, 1)
    net.randomize()
    trainer = pybrain_rprop.RPropMinusTrainer(net, dataset=init_XOR_dataset())
    trainer.trainEpochs(50)


def multithreaded_train(threads=8):
    nets = []
    trainers = []
    processes = []
    data = init_XOR_dataset()

    for n in range(threads):
        nets.append(pybrain_tools.buildNetwork(2, 2, 1))
        nets[n].randomize()
        trainers.append(pybrain_rprop.RPropMinusTrainer(nets[n], dataset=data))
        processes.append(multiprocessing.Process(target=trainers[n].trainEpochs(50)))
        processes[n].start()

    # Wait for all processes to finish
    for p in processes:
        p.join()


if __name__ == '__main__':
    threads = 4
    iterations = 16

    t1 = timeit.timeit("standard_train()",
                       setup="from __main__ import standard_train",
                       number=iterations)
    tn = timeit.timeit("multithreaded_train({})".format(threads),
                       setup="from __main__ import multithreaded_train",
                       number=iterations)

    print "Execution time for single threaded training: {} seconds.".format(t1)
    print "Execution time for multi threaded training: {} seconds.".format(tn)

In my code, there are two functions: one running single threaded and one (supposedly) running multithreaded using the multiprocessing package.

As far as I can judge, my multiprocessing code is sound. But when I run it, the multiprocessing code doesn't run on more than one core. I verified this by checking the run time (with threads = 4 and 4 cores it takes 4 times as long, while it should take approximately as long as a single threaded run). I double checked it by looking at htop/atop.

I know about the Global Interpreter Lock (GIL), but the multiprocessing package is supposed to handle this.

I also know about the issue that scipy causes the cpu affinity to be set in such a way that only one core is used. However, if I print the process affinity just after scipy is imported in the PyBrain package (print psutil.Process(os.getpid()).cpu_affinity()), I can see that the affinity is ok:

$ python ./XOR_PyBrain.py
[0, 1, 2, 3]
Execution time for single threaded training: 14.2865240574 seconds.
Execution time for multi threaded training: 46.0955679417 seconds.

I observe this behaviour on my Debian NAS as well on my Debian Desktop as well as on my Mac.

Version info for the Debian NAS:

CPU: Intel(R) Atom(TM) CPU D2701 @ 2.13GHz
Debian 8.4. Kernel: 3.2.68-1+deb7u1 x86_64
Python 2.7.9
PyBrain 0.3
Scipy 0.14.0-2

So, my question is: how do I let PyBrain train on multiple cores?

@agtoever May kindly I draw your attention to similar [question](https://stackoverflow.com/questions/56344611/how-can-take-advantage-of-multiprocessing-and-multithreading-in-deep-learning-us) regarding **multiprocessing** and **multi-threading**, hope you help. — Mario, Jun 04 '19 at 20:45

score 0 · Answer 1 · answered Nov 26 '17 at 07:45

Everything seems currect. I've tested your code and all cores were working. The problem with the timing comes from the fact that "When you try to process small amount of data, CPU cores can't achieve their fill potential."

The network you built, is very simple, so before each core gets 100% usage, the process will end. Think of it as if you wanna process millions of 5x5 pixel images. The time your computer waits for HDD to get the data is much more than time the CPU takes to process them.

The some goes here. But in much faster area (RAM) and much smaller amount of data (one decimal number). Probably if your computer has a DDR4 RAM, things would change. (But I really don't think so)

Try a heavier process, bigger network, more data, etc. and you'll see what you expect.

May kindly I draw your attention to similar [question](https://stackoverflow.com/questions/56344611/how-can-take-advantage-of-multiprocessing-and-multithreading-in-deep-learning-us) regarding **multiprocessing** and **multi-threading**, hope you help. — Mario, Jun 04 '19 at 20:45

Python: multithreaded learning neural networks using PyBrain and Multiprocessing

1 Answers1