2

I want to add multi-threading to a program so that I can speed up a task by running two or more concurrent processes to work through a loop.

An outline of the code (without multi threading) is

double * a, *b, *c, *d; 
int n=10000;
int i, j;
a=(doube *)calloc(n, sizeof(double)); 
b=(doube *)calloc(n, sizeof(double)); 
c=(doube *)calloc(n, sizeof(double)); 
d=(doube *)calloc(n, sizeof(double)); 

setup(a, b); //routine to set the inital values of arrays a and b 

for (i=0; i<n; i++)
{
  for (j=0; j<n; j++)
  {
    if (i==j) continue;
    c[i]+=func(a[i],a[j]);   //calculation functions 
    d[j]+=func2(b[i],b[j]);
  }
}

So my plan is to use the numbers in the a and b arrays to calculate values in c and d.

I want to multithread the loop at the end of the code fragment above without running into speed issues due to memory access. - I want to split the loop so that different threads each run over the part of the range of i.

I can see three possible ways of proceeding here.

[1] single arrays for a,b,c,d- but potential conflicts due to two threads trying to write to the same number at the same time.

[2] single arrays for a,b but multiple arrays for c, d are created so that there is one copy per thread. Thus threads will all be reading from the same arrays a,b but they will be writing different arrays c,d to avoid possible collisions - these multiple arrays for c,d would be then combined together after all the threads are finished

[3] multiple arrays for a,b,c,d - a copy of each array is made for each different thread so there are no read or write 'collisions' and again the multiple arrays for c,d would be then combined together after all the threads are finished

I expect the answer is not [1] but would really appreciate suggestions on whether option [2] or [3] would be better. [3] may be best, but has overheads of copying the input data of a,b for each thread.

Note that I have searched for similar questions and found some useful things (e.g. Memory considerations with multithreading), but I have not found a clear answer to this question.

tom
  • 1,303
  • 10
  • 17

2 Answers2

1

Having multiple threads read the same data (as long as there's no writing) isn't a problem.

You won't have to worry about writes either if one thread takes care of updating c and the other takes care of updating d.

So you have one thread with a nested loop taking care of c, and another thread with its own nested loop taking care of d. And since each thread is writing different data sets, you don't need to worry about synchronizing them.

If you do that, [1] is fine.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • Many thanks,,.. the outline of the code is a simplification... `a,b` with indexes `i,j` get sent away to a function along with pointers to `c,d` with indexes `i,j` because it is most efficient to calculate `c,d` at the same time..... So I can see the logic of your answer that [1] would be fine based on the question as originally stated, but you have given enough information to make it clear that [2] will be best. Thanks for your clear answer. – tom Mar 25 '21 at 01:12
  • both your answer and the answer of @NateEldredge were posted within 1 minute of each other plus both of you have written excellent clear answers. After refreshing this page a number of times I can now see that Nate posted just posted first - so I am going to accept Nate's answer - but I would have been happy to accept your answer too - apologies that I can only accept one and I have decided to go with the answer that was posted first, even though the time different is less than one minute. – tom Mar 25 '21 at 01:21
0

[2] is likely best.

Unsynchronized writes, or writes and reads, to the same object by multiple threads is a data race, which causes undefined behavior. You can add locking or atomicity to avoid this, but it will typically be a lot slower. So [1] is unacceptable. Even if the threads access different but nearby elements of the array, you can expect a slowdown due to cache line ping-pong.

But there is no problem with having several threads reading the same object at the same time, as long as nobody is writing. In fact, it is best to have just one copy of the object being read; it saves memory and cache space. Thus [2] is more efficient than [3].

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82