0

I am trying to work with 3D arrays in CUDA (200x200x100).

The moment I change my z dimension (model_num) from 4 to 5, I get a segmentation fault. Why, and how can I fix it?

const int nrcells = 200;
const int nphicells = 200;
const int model_num = 5; //So far, 4 is the maximum model_num that works. At 5 and after, there is a segmentation fault

    __global__ void kernel(float* mgridb) 
{
    const unsigned long long int  i = (blockIdx.y * gridDim.x + blockIdx.x) * blockDim.x + threadIdx.x;

    if(tx >= 0 && tx < nphicells && ty >=0 && ty < nrcells && tz >= 0  && tz < model_num){
        //Do stuff with mgridb[i]
    }
}

int main (void)
{

    unsigned long long int size_matrices = nphicells*nrcells*model_num; 
    unsigned long long int mem_size_matrices = sizeof(float) * size_matrices;

    float *h_mgridb = (float *)malloc(mem_size_matrices);
    float mgridb[nphicells][nrcells][model_num];

    for(int k = 0; k < model_num; k++){
        for(int j = 0; j < nrcells; j++){
            for(int i = 0; i < nphicells; i++){
                mgridb[i][j][k] = 0;
            }
        }
    }
    float *d_mgridb;

    cudaMalloc( (void**)&d_mgridb, mem_size_matrices );
    cudaMemcpy(d_mgridb, h_mgridb, mem_size_matrices, cudaMemcpyHostToDevice);

    int threads = nphicells;
    uint3 blocks = make_uint3(nrcells,model_num,1);
    kernel<<<blocks,threads>>>(d_mgridb);
    cudaMemcpy( h_mgridb, d_mgridb, mem_size_matrices, cudaMemcpyDeviceToHost);
    cudaFree(d_mgridb);
    return 0;
}
John W.
  • 153
  • 2
  • 8
  • Please pay a little more attention to formatting and content of code you post in questions. The code as you posted it was unnecessarily hard to read and contained unbalanced {}. – talonmies Jul 09 '13 at 17:53

1 Answers1

3

This is getting stored on the stack:

float mgridb[nphicells][nrcells][model_num];

Your stack space is limited. When you exceed the amount you can store on the stack, you are getting a seg fault, either at the point of allocation, or as soon as you try and access it.

Use malloc instead. That allocates heap storage, which has much higher limits.

None of the above has anything to do with CUDA. Furthermore its not unique or specific to "3D" arrays. Any large stack based allocation (e.g. 1D array) is going to have the same trouble.

You may also have to adjust how you access the array, but it's not difficult to handle a flattened array using pointer indexing.

Your code is actually strange looking, because you are creating an appropriately sized array h_mgridb using malloc and then copying that array to the device (into d_mgridb). It's not clear what purpose mgridb serves in your code. h_mgridb and mgridb are not the same.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • This fixes the problem. I changed float mgridb[nphicells][nrcells][model_num] to float *mgridb = (float *)malloc(mem_size_matrices). Also, in the initialization "for" loop, I referenced it as a 1D array, so mgridb[i + (j*nphicells) + (k*nphicells*nrcells)] = 0. I can see what you're saying, it does not make much sense to have h_mgridb if I am already allocating mgridb. – John W. Jul 09 '13 at 18:56