How to safely parallel the for-loop with memcpy inside

Question

I am developing a originally serial code in KSVD package to support OpenMP. The original code, which serves like im2col in MATLAB and extracts patches from the image, is shown as follows:

/* n stands for the size of an image, sz stands for the patch size to extract */
int blocknum = 0;
for (k=0; k<=n[2]-sz[2]; k+=1) {
    for (j=0; j<=n[1]-sz[1]; j+=1) {
        for (i=0; i<=n[0]-sz[0]; i+=1) {

            /* copy single block */
            for (m=0; m<sz[2]; m++) {
                for (l=0; l<sz[1]; l++) {
                    memcpy(b + blocknum*sz[0]*sz[1]*sz[2] + m*sz[0]*sz[1] + l*sz[0], x+(k+m)*n[0]*n[1]+(j+l)*n[0]+i, sz[0]*sizeof(double));
                }
            }
            blocknum ++;
        }
    }
}

While, I would like to make it parallel by replacing the incremental blocknum with an indexing variable blockid.

/* n stands for the size of an image, sz stands for the patch size to extract */
int blockid3, blockid2, blockid;
for (k=0; k<=n[2]-sz[2]; k+=1) {
    blockid3 = k * (n[1]-sz[1]+1) * (n[0]-sz[0]+1);
#pragma omp parallel for
    for (j=0; j<=n[1]-sz[1]; j+=1) {
        blockid2 = j * (n[0]-sz[0]+1);
        for (i=0; i<=n[0]-sz[0]; i+=1) {
            blockid = i + blockid2 + blockid3;

            /* copy single block */
            for (m=0; m<sz[2]; m++) {
                for (l=0; l<sz[1]; l++) {
                    memcpy(b + blockid*sz[0]*sz[1]*sz[2] + m*sz[0]*sz[1] + l*sz[0], x+(k+m)*n[0]*n[1]+(j+l)*n[0]+i, sz[0]*sizeof(double));
                }
            }

        }
    }
}

Then running leads me to fatal segmentation error. I do not know why (according to the stack trace, it seems related to safe threading). Because I thought parallel threads should not visit the same address once. Am I supposed to set some properties of variables, i.e. static or shared or private? Here is the stack trace:

Stack Trace (from fault):
[  0] 0x00007f9bcaa695de    /usr/local/MATLAB/R2011b/bin/glnxa64/libmwfl.so+00210398 _ZN2fl4diag15stacktrace_base7capt
ureERKNS0_14thread_contextEm+000158
[  1] 0x00007f9bcaa6b62d    /usr/local/MATLAB/R2011b/bin/glnxa64/libmwfl.so+00218669
[  2] 0x00007f9bcaa6b8f5    /usr/local/MATLAB/R2011b/bin/glnxa64/libmwfl.so+00219381 _ZN2fl4diag13terminate_logEPKcRKN
S0_14thread_contextEb+000165
[  3] 0x00007f9bc9a714f5   /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00447733 _ZN2fl4diag13terminate_logEPKcPK8
ucontextb+000085
[  4] 0x00007f9bc9a6e5b4   /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00435636
[  5] 0x00007f9bc9a6f333   /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00439091
[  6] 0x00007f9bc9a6f4c7   /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00439495
[  7] 0x00007f9bc9a7085f   /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00444511
[  8] 0x00007f9bc9a70a15   /usr/local/MATLAB/R2011b/bin/glnxa64/libmwmcr.so+00444949
[  9] 0x00007f9bc89f0cb0              /lib/x86_64-linux-gnu/libpthread.so.0+00064688
[ 10] 0x00007f9bc876cb8e                    /lib/x86_64-linux-gnu/libc.so.6+01346446
[ 11] 0x00007f9b88238bb8 /home/peiyun/schmax3.0/test_im2col/mex_im2colstep.mexa64+00003000
[ 12] 0x00007f9bcb004eea    /usr/lib/gcc/x86_64-linux-gnu/4.6.3//libgomp.so+00032490
[ 13] 0x00007f9bc89e8e9a              /lib/x86_64-linux-gnu/libpthread.so.0+00032410
[ 14] 0x00007f9bc87164bd                    /lib/x86_64-linux-gnu/libc.so.6+00992445 clone+000109

By the way, if they are writing to different addresses, are there any race conditions regarding memcpy inside the omp for-loop?

It is impossible to analyze this, there are too many loops and variables with unknown values, but as a start, have you tried adding the single `#pragma omp parallel for default(shared)` before the first `for` instead, without rewriting everything? If you are never accessing the same memory block, it should be safe. — Vince, Apr 18 '14 at 19:14
Could you elaborate on KSVD ? Do you mean http://en.wikipedia.org/wiki/K-SVD ? — kebs, Apr 18 '14 at 21:07
You tagged this as C++. To me, usage of `memcpy()` in a C++ source has a lousy smell. I agree with Vinces's comment and suggest retagging this as C. — kebs, Apr 18 '14 at 21:10
I agree with @Vince and @kebs. As a side note, do the `src` and `dst` areas overlap? If so you should use `memmove()`. — Timothy Brown, Apr 19 '14 at 01:07
@Vince, I am sorry and I have edited, thanks for your advice! — Peiyun, Apr 19 '14 at 02:58
@kebs, K-SVD is what the wiki indicates. This code is from its toolbox. — Peiyun, Apr 19 '14 at 03:01
@TimothyBrown, I use the variable blockid to make sure that they do not overlap. — Peiyun, Apr 19 '14 at 03:08

Hristo Iliev · Accepted Answer · 2014-04-19T14:26:21.950

There are multiple data races in your code, namely:

/* n stands for the size of an image, sz stands for the patch size to extract */
int blockid3, blockid2, blockid;
for (k=0; k<=n[2]-sz[2]; k+=1) {
    blockid3 = k * (n[1]-sz[1]+1) * (n[0]-sz[0]+1);
#pragma omp parallel for
    for (j=0; j<=n[1]-sz[1]; j+=1) {
        blockid2 = j * (n[0]-sz[0]+1);          // <--- here
        for (i=0; i<=n[0]-sz[0]; i+=1) {        // <--- here
            blockid = i + blockid2 + blockid3;  // <--- here

            /* copy single block */
            for (m=0; m<sz[2]; m++) {           // <--- here
                for (l=0; l<sz[1]; l++) {       // <--- and here
                    memcpy(b + blockid*sz[0]*sz[1]*sz[2] + m*sz[0]*sz[1] + l*sz[0], x+(k+m)*n[0]*n[1]+(j+l)*n[0]+i, sz[0]*sizeof(double));
                }
            }

        }
    }
}

By the rules of OpenMP blockid2, i, blockid, m, and l are all implicitly shared which is not what you want. You should either make them private, or better declare them inside the parallel region and thus make them implicitly private:

#pragma omp parallel for private(i,m,l,blockid,blockid2)
...

or

int blockid3;
for (k=0; k<=n[2]-sz[2]; k+=1) {
    blockid3 = k * (n[1]-sz[1]+1) * (n[0]-sz[0]+1);
#pragma omp parallel for
    for (j=0; j<=n[1]-sz[1]; j+=1) {
        int blockid2 = j * (n[0]-sz[0]+1);
        for (int i=0; i<=n[0]-sz[0]; i+=1) {
            int blockid = i + blockid2 + blockid3;

            /* copy single block */
            for (int m=0; m<sz[2]; m++) {
                for (int l=0; l<sz[1]; l++) {
                    memcpy(b + blockid*sz[0]*sz[1]*sz[2] + m*sz[0]*sz[1] + l*sz[0], x+(k+m)*n[0]*n[1]+(j+l)*n[0]+i, sz[0]*sizeof(double));
                }
            }

        }
    }
}

The latter requires a C99-compliant compiler (because of the way loop variables are declared). Your GCC 4.6.3 requires the -std=c99 option to enable C99 compliance. If no such compiler is available (are there still non-C99 compilers in general use?), you should add the private(i,l,m) clause. You might also want to move the parallelisation to the outermost loop instead in order to minimise the OpenMP overhead.

GCC and ICC both default to GNU89 which allows (with warning) mixed declarations (along with BCPL style comments). GNU89 is better described as GNU99--. — Z boson, Apr 22 '14 at 12:34
GCC 4.8.2 with no `-std=c99` option: `c99.c:5:4: error: ‘for’ loop initial declarations are only allowed in C99 mode` — Hristo Iliev, Apr 22 '14 at 18:14
When I do `int x; x=0; int i` I get `warning: ISO C90 forbids mixed declarations and code [-Wpedantic]` but when I do `for(int i=0; ...` it gives your error. I guess mixed declarations are NOT the same as loop initial declarations. This is super annoying! I would much rather have loop initial declarations than mixed declarations. What's the point in GNU89 allowing mixed declarations but not loop initial declarations? — Z boson, Apr 22 '14 at 20:34
I'm not sure exactly when did the loop initial declaration appear for the first as a language construct in the C family. — Hristo Iliev, Apr 22 '14 at 22:12
In case you're interested http://stackoverflow.com/questions/23229872/gnu89-mixed-declarations-and-loop-initial-declarations?noredirect=1#comment35540546_23229872 — Z boson, Apr 23 '14 at 07:14
@HristoIliev, thanks! This works. I forgot to check the race conditions. — Peiyun, Apr 24 '14 at 00:38

How to safely parallel the for-loop with memcpy inside

1 Answers1