At the start of #pragma omp parallel
a bunch of threads are created, then when we get to #pragma omp for
the workload is distributed. What happens if this for loop has a for loop inside it, and I place a #pragma omp for
before it as well? Does each thread create new threads? If not, which threads are assigned this task? What exactly happens in this situation?

- 7,842
- 2
- 47
- 62

- 3,973
- 7
- 38
- 62
2 Answers
By default, no threads are spawned for the inner loop. It is done sequentially using the thread that reaches it.
This is because nesting is disabled by default. However, if you enable nesting via omp_set_nested()
, then a new set of threads will be spawned.
However, if you aren't careful, this will result in p^2
number of threads (since each of the original p
threads will spawn another p
threads.) Therefore nesting is disabled by default.

- 464,885
- 45
- 335
- 332
-
I think your answer may be wrong in the sense that it address a different situation from the one asked by the OP. – Massimiliano Jun 08 '13 at 04:01
-
@Massimiliano Hmm... good point. But I can neither confirm nor deny the UB that you claim in your answer since I seem to have a different interpretation of those two bullets. – Mysticial Jun 08 '13 at 04:07
-
What is your interpretation? At least, do you agree that if `n%nthreads != 0` and schedule is not `static` then point 2 is always violated? – Massimiliano Jun 08 '13 at 04:16
-
Now that I read it again in a different context, I'm leaning a bit more towards your interpretation (+1 for pointing that out). My original interpretation is that the second bullet wasn't even applicable because the second `omp for` implied nesting. Seems ambiguous at best to me. I'd be nice if the standard clarified that. But I'm no language lawyer... – Mysticial Jun 08 '13 at 04:23
-
Ok, I finally found examples to support my interpretation (deep buried within the standard). See edit to my answer. – Massimiliano Jun 08 '13 at 05:04
In a situation like the following:
#pragma omp parallel
{
#pragma omp for
for(int ii = 0; ii < n; ii++) {
/* ... */
#pragma omp for
for(int jj = 0; jj < m; jj++) {
/* ... */
}
}
}
what happens is that you trigger an undefined behavior as you violate the OpenMP standard. More precisely you violate the restrictions appearing in section 2.5 (worksharing constructs):
The following restrictions apply to worksharing constructs:
- Each worksharing region must be encountered by all threads in a team or by none at all.
- The sequence of worksharing regions and barrier regions encountered must be the same for every thread in a team.
This is clearly shown in the examples A.39.1c and A.40.1c:
Example A.39.1c: The following example of loop construct nesting is conforming because the inner and outer loop regions bind to different parallel regions:
void work(int i, int j) {}
void good_nesting(int n)
{
int i, j;
#pragma omp parallel default(shared)
{
#pragma omp for
for (i=0; i<n; i++) {
#pragma omp parallel shared(i, n)
{
#pragma omp for
for (j=0; j < n; j++)
work(i, j);
}
}
}
}
Example A.40.1c: The following example is non-conforming because the inner and outer loop regions are closely nested
void work(int i, int j) {}
void wrong1(int n)
{
#pragma omp parallel default(shared)
{
int i, j;
#pragma omp for
for (i=0; i<n; i++) {
/* incorrect nesting of loop regions */
#pragma omp for
for (j=0; j<n; j++)
work(i, j);
}
}
}
Notice that this is different from:
#pragma omp parallel for
for(int ii = 0; ii < n; ii++) {
/* ... */
#pragma omp parallel for
for(int jj = 0; jj < m; jj++) {
/* ... */
}
}
in which you try to spawn a nested parallel region. Only in this case the discussion of Mysticial answer holds.

- 7,842
- 2
- 47
- 62