How to train my neural network faster by running CPU and GPU in parallel

Question

I'm trying to train a (pretty big) neural network using a GPU. The network is written in pytorch. I use python 3.6.3 running on ubuntu 16.04. Currently, the code is running, but it's taking about twice as long as it should to run because my data-grabbing process using the CPU is run in series to the training process using the GPU. Essentially, I grab a mini-batch from file using a mini-batch generator, send that mini-batch to the GPU and then train the network on that minibatch. I've timed the two processes (grabbing a mini batch and training on that mini batch), and they are similar in how long they take (both take around 200ms). I'd like to do something similar to keras' fit_generator method which runs the data-grabbing in parallel to the training (it creates a que of minibatches that can be sent to the GPU when the GPU wants to train on that mini batch). What is the best way to do that? For concreteness, my data generator code and training code run something like this (pseudocode):

    #This generator opens a file, grabs and yields a mini batch
    def data_gen(PATH,batch_size=32):
        with h5py.File(PATH,'r') as f:
            for mini-batch in mini-batches:
                X = f['X'][mini-batch]
                Y = f['Y'][mini-batch]
                yield (X,Y)

    for epoch in range(epochs):
        for data in data_gen(PATH):
                mini_X,mini_Y = data
                mini_X = autograd.Variable(torch.Tensor(mini_X))
                mini_Y = autograd.Variable(torch.Tensor(mini_Y))
                out = net(mini_X)
                loss = F.binary_cross_entropy(out,mini_Y)
                loss.backward()
                optimizer.step()

Something like that. As you can see, I use the data_gen as an actual generator for the for-loop, so it's being run sequentially with the training. I would like to run it in parallel and have it generate a que of minibatches which I can then feed to my network. Currently, it takes more than 5 hours to run one epoch, I think with a parallelized version of this, I could get that down to 3 hours or less. I looked into multiprocessing on python, but the explanation on the official documentation was a bit dense for me since I have only limited prior experience in parallel computing. If there's some resources I could take a look at, pointing me towards those resources would be very helpful too! Thanks.

really, this whole question simplifies to: How can I manage concurrent processes in python, where one process will be cpu-intensive and the others will not as they are computed on the GPU. See https://stackoverflow.com/q/2846653/4013571 — Alexander McFarlane, Dec 11 '17 at 11:27
Thanks, I will look into that. I did find a torch.multiprocessing module and was trying it out yesterday, but I'm running into a memory error in my GPU (no memory error before), so I must be doing something wrong. — enumaris, Dec 11 '17 at 17:13

score 0 · Answer 1 · answered Dec 11 '17 at 11:20

0

You will need to use threads for the data generation. The idea is to let the CPU handle the data generation (usually loading) while your GPU does the training. That been said, it is not the CPU that will slow things down. It is the constant reading and writing of files. If you are using a dataset make sure the files are copied or extracted into contiguous blocks on your file system. If your files are defragmented across your hard drive, loading them will be a bottleneck regardless of the multi-threading mechanism you are using. With SSD hard drives it is not noticeable.

answered Dec 11 '17 at 11:20

Mo Hossny

732
4
16

Right, the file is too big for a SSD (I can't afford such a big SSD heh) but I did benchmark grabbing mini-batches and it takes just a hair less time than it takes my GPU to train on those mini-batches (both are around 200ms each). If my neural net was much smaller, the loading data part would be the bottleneck. That being said, I was wondering what the best way to multi-thread this process is since I have little experience in multi-threading. – enumaris Dec 11 '17 at 17:12
You do not need to worry about threading now. You can use the data loaders supplied by pyrotch. The data loaders will take care of the threading. Check my answer at https://stackoverflow.com/a/45118712/7387369 – Mo Hossny Dec 12 '17 at 06:14

How to train my neural network faster by running CPU and GPU in parallel

1 Answers1