9

I'm running a completely parallel matrix multiplication program on a Mac Pro with a Xeon processor. I create 8 threads (as many threads as cores), and there are no shared writing issues (no writing to the same locations). For some reason, my use of pthread_create and pthread_join is about twice as slow as using #pragma openmp.

There are no other differences in anything... same compile options, same number of threads in both cases, same code (except the pragma/pthread portions obviously), etc.

And the loops are very big -- I'm not parallelizing small loops.

(I can't really post the code because it's school work.)

Why might this be happening? Doesn't OpenMP use POSIX threads itself? How can it be faster?

user541686
  • 205,094
  • 128
  • 528
  • 886
  • Do they both use the same amount of cumulative CPU time? – Gabe Apr 13 '11 at 03:45
  • 1
    Have you verified that OpenMP is using the same number of threads as your manual version? – Gabe Apr 13 '11 at 03:49
  • 3
    What happens if you only use 7 threads on each? – Jess Apr 13 '11 at 03:51
  • 3
    @Jess: Brilliant question!! I tried it, it was faster... it turned out I was *creating* 8 threads, but I already had a master thread, for a total of 9, which is one more than the number of cores! (Wow, haha...) Feel free to put that as the answer so I'll accept it. :) – user541686 Apr 13 '11 at 04:09

1 Answers1

6

(edited) What is your main thread doing? Without seeing your code, I was guessing that the main thread is actually barely running, but still eating up clock-cycles while the pthreads finish, then it starts again and continues. Each time its given cycles there is overhead to pausing/continuing the other threads.

In OpenMP, the main thread probably goes to sleep, and waits for a wake-up event when the parallel regions finish.

Jess
  • 2,991
  • 3
  • 27
  • 40
  • 1
    Um... that's not exactly what I meant haha. I meant that the there was one more thread than the number of cores, so they were competing for processing time. (The management overhead is negligible here, since it's definitely not *more* than that of OpenMP.) – user541686 Apr 13 '11 at 04:34
  • 2
    In OpenMP the initial thread (or in your terminology the main thread), does work along with the rest of the team in the worksharing regions. None of the threads sleep unless there is no further work to be done (and then the threads either sleep or spin wait at the barrier depending on the implementation). – ejd Apr 13 '11 at 14:53