OpenMP and 17 Nested For-Loops

Question

I have a giant nested for-loop, designed to set a large array to its default value. I'm trying to use OpenMP for the first time to parallelize, and have no idea where to begin. I have been reading tutorials, and am afraid the process will be performed independently on N number of cores, instead of N cores divided the process amongst itself for a common output. The code is in C, compiled in Visual Studio v14. Any help for this newbie is appreciated -- thanks! (Attached below is the monster nested for-loop...)

    for (j = 0;j < box1; j++)
    {
        for (k = 0; k < box2; k++)
        {
            for (l = 0; l < box3; l++)
            {
                for (m = 0; m < box4; m++)
                {
                    for (x = 0;x < box5; x++)
                    {
                        for (y = 0; y < box6; y++)
                        {
                            for (xa = 0;xa < box7; xa++)
                            {
                                for (xb = 0; xb < box8; xb++)
                                {
                                    for (nb = 0; nb < memvara; nb++)
                                    {
                                        for (na = 0; na < memvarb; na++)
                                        {
                                            for (nx = 0; nx < memvarc; nx++)
                                            {
                                                for (nx1 = 0; nx1 < memvard; nx1++)
                                                {
                                                    for (naa = 0; naa < adirect; naa++)
                                                    {
                                                        for (nbb = 0; nbb < tdirect; nbb++)
                                                        {
                                                            for (ncc = 0; ncc < fs; ncc++)
                                                            {
                                                                for (ndd = 0; ndd < bs; ndd++)
                                                                {
                                                                    for (o = 0; o < outputnum; o++)
                                                                    {
                                                                        lookup->n[j][k][l][m][x][y][xa][xb][nb][na][nx][nx1][naa][nbb][ncc][ndd][o] = -3;     //set to default value

                                                                    }
                                                                }
                                                            }
                                                        }
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }

@HighPerformanceMark This is part of a project I was hired into to work on, so unfortunately, I cannot change languages. — Daniel R. Livingston, Jul 31 '15 at 19:49

score 1 · Accepted Answer · edited May 23 '17 at 12:29

1

If n is actually a multidimensional array, you can do this:

size_t i;
size_t count = sizeof(lookup->n) / sizeof(int);
int *p = (int*)lookup->n;
for( i = 0; i < count; i++ )
{
    p[i] = -3;
}

Now, that's much easier to parallelize.

Read more on why this works here (applies to C as well): How do I use arrays in C++?

edited May 23 '17 at 12:29

Community

1
1

answered Jul 31 '15 at 00:39

paddy

60,864
6
61
103

Except that `int *p = lookup->n;` will not compile without an explicit cast. – AnT stands with Russia Jul 31 '15 at 00:41
@AnT Have I spent too much time with C++? I was pretty sure arrays decay to pointers in C too. – paddy Jul 31 '15 at 00:43
2

Firstly, yes, they do decay to pointers. But in this case the array is declared as *multi-dimensional*. And in both C and C++, a multi-dimensional `int [N0][N1]...[Nn]` array decays to `int (*)[N1][N2]...[Nn]` pointer, which cannot be converted to `int *` pointer without a cast. Secondly, the same is true for C as well, which also formally requires a cast. They only reason it might work without a cast in *some* C compilers is that they allow it as an extension (and usually issue a warning). E.g. http://coliru.stacked-crooked.com/a/1310ca303c9f6e3e – AnT stands with Russia Jul 31 '15 at 00:51
I really like paddy's idea of collapsing this 17-dimensional array down to 1, and setting the values to default from there. However, as @AnT showed, this doesn't seem to work as straightforward with C. Is there a way in C to do this? Using multi-threads with OpenMP would be significantly easier if this were the case. – Daniel R. Livingston Jul 31 '15 at 22:51
@DanielR.Livingston Not all is lost. AnT was just pointing out that you need an explicit cast to convert the array to a pointer. I've edited my answer accordingly. As for the parallelisation, you should measure to see whether you get better performance. Sometimes the overhead of doing a task in parallel can make it worse than the naive approach. – paddy Aug 01 '15 at 11:04

score 1 · Answer 2 · answered Aug 01 '15 at 13:01

This is more of an extended comment than an answer.

Find the iteration limit (ie the variable among box1, box2, etc) with the largest value. Revise your loop nest so that the outermost loop runs over that. Simply parallelise the outermost loop. Choosing the largest value means that you'll get, in the limit, an equal number of inner loop iterations to run for each thread.

Collapsing loops, whether you can use OpenMP's collapse clause or have to do it by hand, is only useful when you have reason to believe that parallelising over only the outermost loop will result in significant load imbalance. That seems very unlikely in this case, so distributing the work (approximately) evenly across the available threads at the outermost level would probably provide reasonably good load balancing.

score 0 · Answer 3 · answered Jul 31 '15 at 23:31

0

I believe, based on tertiary research, that the solution might be found in adding #pragma omp parallel for collapse(N) directly above the nested loops. However, this seems to only work in OpenMP v3.0, and the whole project is based on Visual Studio (and therefore, OpenMP v2.0) for now...

answered Jul 31 '15 at 23:31

Daniel R. Livingston

1,227
14
36

You could collapse the loops yourself by hand and then parallelise the outer loop with `#pragma omp parallel for`. – IKavanagh Aug 01 '15 at 11:05

OpenMP and 17 Nested For-Loops

3 Answers3