I have the following code:
omp_set_num_threads(10);
int N = 10000;
int chunk = 1000;
int i;
float a[N];
for (i=0; i < N; i++)
{
a[i] = i;
}
#pragma omp parallel private(i)
{
#pragma omp for schedule(dynamic,chunk) nowait
for (i=0; i < N; i++) a[i] = a[i] + a[i];
}
The sequential code is the same, without the two pragma directives.
sequential ~150us parallel ~1100us
That's a huge gap, and I expected it to be the other way around.
Does someone have a clue what's wrong, or does OpenMP have so much to do in the background? Does someone have an example, where I can see that the parallelized for loop is faster?
Thanks