8

I'm currently parallelizing program using openmp on a 4-core phenom2. However I noticed that my parallelization does not do anything for the performance. Naturally I assumed I missed something (falsesharing, serialization through locks, ...), however I was unable to find anything like that. Furthermore from the CPU Utilization it seemed like the program was executed on only one core. From what I found sched_getcpu() should give me the Id of the core the thread executing the call is currently scheduled on. So I wrote the following test program:

#include <iostream>
#include <sstream>
#include <omp.h>
#include <utmpx.h>
#include <random>
int main(){
    #pragma omp parallel
    {
        std::default_random_engine rand;
        int num = 0;
    #pragma omp for
        for(size_t i = 0; i < 1000000000; ++i) num += rand();
    auto cpu = sched_getcpu();
    std::ostringstream os;
        os<<"\nThread "<<omp_get_thread_num()<<" on cpu "<<sched_getcpu()<<std::endl;
        std::cout<<os.str()<<std::flush;
    std::cout<<num;
    }
}

On my machine this gives the following output(the random numbers will vary of course):

Thread 2 on cpu 0 num 127392776
Thread 0 on cpu 0 num 1980891664
Thread 3 on cpu 0 num 431821313
Thread 1 on cpu 0 num -1976497224

From this I assume that all threads execute on the same core (the one with id 0). To be more certain I also tried the approach from this answer. The results where the same. Additionally using #pragma omp parallel num_threads(1) didn't make the execution slower (slightly faster in fact), lending credibility to the theory that all threads use the same cpu, however the fact that the cpu is always displayed as 0 makes me kind of suspicious. Additionally I checked GOMP_CPU_AFFINITY which was initially not set, so I tried setting it to 0 1 2 3, which should bind each thread to a different core from what I understand. However that didn't make a difference.

Since develop on a windows system, I use linux in virtualbox for my development. So I though that maybe the virtual system couldn't access all cores. However checking the settings of virtualbox showed that the virtual machine should get all 4 cores and executing my test program 4 times at the same time seems to use all 4 cores judging from the cpu utilization (and the fact that the system was getting very unresponsive).

So for my question is basically what exactly is going on here. More to the point: Is my deduction that all threads use the same core correctly? If it is, what could be the reasons for that behavious?

Community
  • 1
  • 1
Grizzly
  • 19,595
  • 4
  • 60
  • 78
  • 1
    heres a common error did you set the environment variable OMP_NUM_THREADS =4? – pyCthon Feb 21 '12 at 01:18
  • 1
    @pyCthon: `OMP_NUM_THREADS` does not seem to be set, however since openmp does create 4 threads I don't think I would need to. – Grizzly Feb 21 '12 at 01:22
  • weird i think it might be something with your virtual machine i tried the same code even installed utmpx.h and it seemed to work fine on a 8 and a 16 core machine – pyCthon Feb 21 '12 at 01:58
  • I read somewhere that the virtual machine (guest OS) runs as a single process inside your host OS. Could this be the cause of the behaviour that you are seeing? – maths-help-seeker Apr 01 '12 at 21:25
  • The same is happening to me on a 2 CPU x86-64 server with Scientific Linux 6. No IDE or virtual machine in sight. – Vladimir F Героям слава Jun 30 '13 at 16:58

4 Answers4

6

After some experimentation I found out that the problem was that I was starting my program from inside the eclipse IDE, which seemingly set the affinity to use only one core. I thought I got the same problems when starting from outside of the IDE, but a repeated test showed that the program works just fine, when started from the terminal instead of from inside the ide.

Grizzly
  • 19,595
  • 4
  • 60
  • 78
  • These can be set via variables like these: https://web.archive.org/web/20220114064748/https://pages.tacc.utexas.edu/~eijkhout/pcse/html/omp-affinity.html – Y00 Sep 23 '20 at 14:01
2

I compiled your program using g++ 4.6 on Linux

g++ --std=c++0x -fopenmp test.cc -o test

The output was, unsurprisingly:

Thread 2 on cpu 2

Thread 3 on cpu 1
910270973
Thread 1 on cpu 3
910270973
Thread 0 on cpu 0
910270973910270973

The fact that 4 threads are started (if you have not set the number of threads in any way, e.g. using OMP_NUM_THREADS) should imply that the program is able to see 4 usable CPUs. I cannot guess why it is not using them but I suspect a problem in your hardware/software setting, in some environment variable, or in the compiler options.

baol
  • 4,362
  • 34
  • 44
0

You should use #pragma omp parallel for
And yes, you're right about not needing OMP_NUM_THREADS. omp_set_num_threads(4); should also have done fine.

Nav
  • 19,885
  • 27
  • 92
  • 135
  • Why would I use `#pragma omp parallel for`, if I want the threads to do things outside the loop (like writing their id to the output)? And as I mentioned it does create 4 threads by default, the just seem to be executed on the same core – Grizzly Feb 21 '12 at 14:36
  • That's true too. btw, if you don't say omp *parallel* for, then no parallelization happens in the loop. But of course you're inside a parallel section, so.... The only other possible explanation I can think of is a lack of hardware support for your virtualbox. Have you tried with other CPU's? http://superuser.com/questions/33723/getting-2-processors-to-work-with-virtualbox-on-dual-core-celeron – Nav Feb 22 '12 at 03:19
  • I did not. However as mentioned it is possible to use all cores from the vbox, so lack of support does seem unlikely – Grizzly Feb 23 '12 at 16:16
0

if you are running on windows, try this:

c:\windows\system32\cmd.exe /C start /affinity F path\to\your\program.exe

/affinity 1 uses CPU0

/affinity 2 uses CPU1

/affinity 3 uses CPU0 and CPU1

/affinity 4 uses CPU2

/affinity F uses all 4 cores

Convert the number to hex, and see the bits from right which are the cores to be used.

you can verify the affinity while its running using task-manager.

Krishnaraj
  • 421
  • 1
  • 3
  • 10
  • The vbox does have the correct affinity to use all cores (I checked and besides how would it use all of them in my test with multiple starts of my testprogram). Since I use linux inside the vbox that doesn't really help there. – Grizzly Feb 21 '12 at 14:38