I want to effectively parallelize the following sum in C:
#pragma omp parallel for num_threads(nth)
for(int i = 0; i < l; ++i) pout[pg[i]] += px[i];
where px
is a pointer to a double array x
of size l
containing some data, pg
is a pointer to an integer array g
of size l
that assigns each data point in x
to one of ng
groups which occur in a random order, and pout
is a pointer to a double array out
of size ng
which is initialized with zeros and contains the result of summing x
over the grouping defined by g
.
The code above works, but the performance is not optimal so I wonder if there is somewthing I can do in OpenMP (such as a reduction()
clause) to improve the execution. The dimensions l
and ng
of the arrays, and the number of threads nth
are available to me and fixed beforehand. I cannot directly access the arrays, only the pointers are passed to a function which does the parallel sum.