1

I'm trying to parallelize a ray tracer in C, but the execution time is not dropping as the number of threads increase. The code I have so far is:

main2(thread function):

float **result=malloc(width * sizeof(float*));
int count=0;
for (int px=0;, px<width; ++px)
{
     ...
     for (int py=0; py<height; ++py)
     {
         ...
         float *scaled_color=malloc(3*sizeof(float));
         scaled_color[0]=...
         scaled_color[1]=...
         scaled_color[2]=...

         result[count]=scaled_color;
         count++;
         ...
      }
}
...
return (void *) result;

main:
pthread_t threads[nthreads];
 for (i=0;i<nthreads;i++)
 {
      pthread_create(&threads[i], NULL, main2, &i);
 }

 float** result_handler;

 for (i=0; i<nthreads; i++)
 {
      pthread_join(threads[i], (void *) &result_handler);
      int count=0;

      for(j=0; j<width;j++)
     {
          for(k=0;k<height;k++)
          {
               float* scaled_color=result_handler[count];
               count ++;
               printf...
           }
           printf("\n");
       }
  }

main2 returns a float ** so that the picture can be printed in order in the main function. Anyone know why the exectution time is not dropping (e.g. it runs longer with 8 threads than with 4 threads when it's supposed to be the other way around)?

MPelletier
  • 16,256
  • 15
  • 86
  • 137
Trisha
  • 141
  • 1
  • 2
  • 10
  • 1
    Adding threads doesn't magically make your computer faster ... you haven't specified how many cores you have, and unless you have at least 8, 8 compute-bound threads will slow you down by the amount of overhead for managing and switching the threads. – Jim Balter Mar 06 '11 at 23:48
  • 1
    ...and possible cause more cache misses, slowing things down even further. – dmckee --- ex-moderator kitten Mar 06 '11 at 23:52
  • Related: [Does it make sense to spawn more than one thread per processor?](http://stackoverflow.com/q/503551/2509). – dmckee --- ex-moderator kitten Mar 06 '11 at 23:57
  • possible duplicate of [call a function many times in parallel](http://stackoverflow.com/questions/10217719/call-a-function-many-times-in-parallel) – Peter G. May 30 '12 at 14:51

3 Answers3

8

It's not enough to add threads, you need to actually split the task as well. Looks like you're doing the same job in every thread, so you get n copies of the result with n threads.

Erik
  • 88,732
  • 13
  • 198
  • 189
3

Parallelism of programs and algorithms is usually non trivial to achieve and doesn't come without some investment.

I don't think that working directly with threads is the right tool for you. Try to look into OpenMp, it is much more highlevel.

Jens Gustedt
  • 76,821
  • 6
  • 102
  • 177
  • OpenMP is very unnatural to learn (and I wouldn't use it in a serious production environment since it is difficult to abstract) but it is 1) easy to learn 2) standard 3) easy to use to parallelize huge loops. – Alexandre C. Mar 07 '11 at 08:03
  • 1
    @Alexandre, can you explain what you mean by "*unnatural*"? – Jens Gustedt Mar 07 '11 at 08:45
  • I always found OpenMP's syntax difficult for doing more elaborated stuff than parallelizing a for loop. It seriously lacks proper OO abstraction. – Alexandre C. Mar 07 '11 at 09:48
  • @Alexandre, ok I see what you mean. But in many, many applications like the one here `for`-loops are *the* tool for structuring the code. It think the original language to program such problems is called *for*tran :) – Jens Gustedt Mar 07 '11 at 10:41
  • yes, this particular applications will probably benefit from one line of OpenMP (hence my upvote). – Alexandre C. Mar 07 '11 at 10:43
0

Two things are working against you here. (1) Unless you can allocate threads to more than one core, you couldn't expect a speed up in the first place; using a single core, that core has the same amount of work to do whether you parallelize the code or not. (2) Even with multiple cores, parallel performance is exquisitely sensitive to the ratio of computation done on-core to the amount of communication necessary between cores. With ptrhead_join() inside the loop, you're incurring a lot of this kind of 'stop and wait for the other guy' kind of performance hits.

JustJeff
  • 12,640
  • 5
  • 49
  • 63