1

Can anyone help me to understand why the following code causes a segmentation fault? Likewise, can anyone help me understand why swapping out the two lines labelled "bad" for the two lines labelled "good" does not result in a segmentation fault?

Note that the seg fault seems to occur at the cudaMalloc line; if I comment that out I also do not see a segmentation fault. These allocations seem to be stepping on each other, but I don't understand how.

The intent of the code is to set up three structures: h_P on the host, which will be populated by a CPU routine d_P on the device, which will be populated by a GPU routine h_P_copy on the host, which will be populated by copying the GPU data structure back in.

That way I can verify correct behavior and benchmark one vs the other.
All of those are, indeed, four-dimensional arrays.

(If it matters, the card in question is a GTX 580, using nvcc 4.2 under SUSE Linux)

#define NUM_STATES              32
#define NUM_MEMORY              16

int main( int argc, char** argv) {

        // allocate and create P matrix
        int P_size      = sizeof(float) * NUM_STATES * NUM_STATES * NUM_MEMORY * NUM_MEMORY;
        // float *h_P      = (float*) malloc (P_size);  **good**
        // float *h_P_copy = (float*) malloc (P_size);  **good**
        float h_P[P_size];                            //  **bad**
        float h_P_copy[P_size];                       //  **bad**
        float *d_P;
        cudaMalloc( (void**) &d_P, P_size);
        cudaMemset( d_P, 0.0, P_size);

}
Novak
  • 4,687
  • 2
  • 26
  • 64
  • not familiar with cuda, but don't you need to call some sort of `synchronize` every so often? – Jimmy Lu Jun 20 '13 at 01:47
  • It looks okay to me, however perhaps those variables are being created on the stack rather than in a data segment of your program. Thus if they are too large you could be causing a seg fault? The malloc solution puts them on the heap and thus your program mysteriously works again. You could try #defining P_size rather than calculating it so that it is static and the program uses data segment rather than heap space. – dave Jun 20 '13 at 01:52

2 Answers2

3

This is likely due to stack corruption of some sort.

Notes:

  • The "good" lines allocate out of the system heap, the "bad" lines allocate stack storage.
  • Normally the amount you can allocate from the stack is quite a bit smaller than what you can allocate from the heap.
  • The "good" and "bad" declarations are not reserving the same amount of float storage. The "bad" are allocating 4x as much float storage.
  • Finally, cudaMemset, just like memset, is setting bytes and expects a unsigned char quantity, not a float (0.0) quantity.

Since the cudaMalloc line is the first one that actually "uses" (attempts to set) any of the allocated stack storage in the "bad" case, it is where the seg fault occurs. If you added an additional declaration like so:

    float *d_P;
    float myval;  //add
    myval = 0.0f; //add2
    cudaMalloc( (void**) &d_P, P_size);

I suspect you might see the seg fault occur on the "add2" line, as it would then be the first to make use of the corrupted stack storage.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Agh, I should know better than that. It's the stack/heap issue, most likely, but I was so worried about the Cuda aspect I turned off my brain for the simple C parts. – Novak Jun 20 '13 at 14:38
1

The two lines labeled good are allocating 262144 * sizeof(float) bytes. The two lines labeled bad are allocating 262144 * sizeof(float) * sizeof(float) bytes.

Tad
  • 517
  • 8
  • 30
  • Variable length stack arrays supported as GCC extension, C99, and C++11 are not supported by all compilers. Putting MBs of data on the stack is usually discouraged. See [how do I find the Maximum Stack Size](http://stackoverflow.com/questions/7535994/how-do-i-find-the-maximum-stack-size) to get more information on how to find and resize the processes stack. – Greg Smith Jun 20 '13 at 04:29