4

I have problem with my library for neural networks. It uses multithreading to fasten computations. But after about 30-60 sec of runtime my program does not utilize 100% of my i7 3610QM 4cores 8threads anymore.

Basically my processing looks like (c# with pseudocode)

for each training example t in training set
    for each layer l in neural network
        Parallel.For(0, N, (int i)=>{l.processForward(l.regions[i])})
    for each layer l in neural network (but with reversed order)
        Parallel.For(0, N, (int i)=>{l.backPropageteError(l.regions[i])})

Where regions is layer's list of precalculated regions of neuron to process. Every region is the same size of 1/N of current layer so Tasks are same size to minimize chance that other threads need to wait for longest task to finish.

Like i said, this processing scheme is consuming 100% of my processor only for a short time and then drops to about 80-85%. In my case i set N to Environment.ProcessorsCount (= 8);

I can share whole code/repository if anyone is willing to help.

I tried to investigate and I created new console project and put there almost Hello World of Parallel.For() and i simply can't tell what is going on. This might be other issue of Parallel.For() but i also want you to address this problem. Here is the code:

class Program
{
    static void Main(string[] args)
    {
        const int n = 1;

        while (true)
        {
            //int counter = 0; for (int ii = 0; ii < 1000; ++ii) counter++;

            Parallel.For(0, n, (int i) => { int counter = 0; for (int ii = 0; ii < 1000; ++ii) counter++; });
        }

    }
}

In this code, I constantly (while loop) create one task (n=1) that has some work to do (increase counter one thousand times). As i know, Parallel.For blocks execution / waits for all parallel calls to finish. If that is true it should be doing the same work as commented section (provided n=1). But on my computer, this program uses 100% of CPU, like there is work for more than one thread! How is that possible? When i switch to commented version, program uses less than 20% of CPU and this is what I expected. Please help me understand this behaviour.

Przemek B
  • 125
  • 8
  • As far as I understand Parallel.For it is for doing __work__ in parallel threads. In you example there is __no__ work done in those loops. A good compiler would completly delete them as they produce no output.. Also setting the DegreeOfParallelization to 100% is not a good idea anyway imo; 90% should be good enough and keep the system nicely responsive.. - Also do keep the __cost__ of going parallel in mind!! – TaW Oct 26 '14 at 13:06
  • There is not enough information here to answer your first question. And your second code will spend most of its time on synchronization. Try changing it to `ii < 100000000`. – svick Oct 27 '14 at 18:11
  • Parallel.For was early, it has training-wheel problems. Covered in [this post](http://stackoverflow.com/a/25950320/17034). – Hans Passant Oct 27 '14 at 22:55

1 Answers1

0

As @TaW said, their is a cost of going parallel. That's why f() and Parallel.For(0, n, _ => f()) are not equivalent. Parallel version incurs thread scheduling and context switching. In your case the execution time of f() is comparable to thread scheduling overhead. That why you do get performance degrade with parallel version. Parallel.For do wait until operation completes, but is completes so fast that several threads run on the CPU in a very short period of time (remember that each time you invoke Parallel.For it may choose different thread to run f() on it) on different CPU cores.

As for the first part of the question, i guess the problem lies in the index range passed to Parallel.For. Instead of [0, number of CPU cores), it should be equal to the index range of data.

alpinsky
  • 524
  • 3
  • 7
  • Ok, i understand now the second issue. I mean, 1000 iteration is to small work in example above, and using parallel.for creates a lot of footprint on many threads, that is why i get 100% CPU usage even with n=1. What i dont understand is performance decreasing over time to about 80%. It cant be my system (win 8.1) becouse when i execute example above (instead of my real library code) i get permanent 100% CPU usage. Example running with n=Environment.processorCount and iteration in loop 1 000 000. This gives nice 100% not dropping usage, what can i be missing? – Przemek B Oct 27 '14 at 20:31
  • I have read the first part of the question again and don't undestand now the way `Parallel.For` is used. How is it possible that the size of `l.regions` correlates with the number of cores? I guess the index should go from zero to the length of `l.regions`. First two parameters to `Parallel.For` specify index range, not the number of CPU cores to run code on. – alpinsky Oct 27 '14 at 22:18
  • I wrote: "Where regions is layer's list of precalculated regions of neuron to process. Every region is the same size of 1/N of current layer so Tasks are same size to minimize chance that other threads need to wait for longest task to finish." Before whole heavy computation, number of cores that network can use is set by Threads property. Property accessor split layer's surfrace to this number of regions so parallel.for will be called on this numer of regions. In current setup it is not allowed to try to patrition layer to more regions than logical processor avaible. – Przemek B Oct 28 '14 at 11:42
  • I'am sorry for being inattentive. – alpinsky Oct 28 '14 at 13:07