I am writing c++ codes using OpenMP. I have a global huge array (100,000+ elements) that will be modified by adding values in a for loop. Is there a way that I can efficiently have each thread created by OpenMP for parallel maintain its local copy of array and then join after the loop? Since the number of threads is a variable, I could not create the local copies of array beforehand. If using a global copy and address the race condition by a synchronization lock, the performance is terrible.
Thanks!
Edited: Sorry for not being clear. Here's some pseudo-code hopefully could clarify the scenario:
int* huge_array=new int[N];
memset(huge_array, 0, N*sizeof(int));
#pragma omp parallel for
for (i=0; i<n; i++)
{
get a value v independently
get a position p independently
// I have to set a lock here
omp_set_lock(&lock);
huge_array[p] += v;
omp_unset_lock(&lock);
}
Is there a way to improve the performance of the code above?