cudamalloc of 2D array

Question

I'm trying to copy a 2D matrix from host to device. I wrote this

    int dev=0;
    cudaSetDevice(dev);

    uint16_t * dev_matrix;
    size_t pitch;
    cudaMallocPitch(&dev_matrix,&pitch, 1024*sizeof(uint16_t), 65536);
    cudaMemcpy2D(dev_matrix, pitch, matrix, 1024*sizeof(uint16_t),  1024*sizeof(uint16_t), 65536, cudaMemcpyHostToDevice);
    //kernel function to implement
    cudaFree(dev_matrix);
    free (matrix);

matrix is a 2D uint16_t vector (1024x65536). This code returns me segmentation fault, I can't understand why

Provide a complete code. Something that someone else could copy, paste, compile, and run, and see the problem, without having to add anything or change anything. And you should always be doing [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) before you start asking others for help. Why not just show in your code exactly how you are defining `matrix` instead of describing it? — Robert Crovella, Apr 13 '15 at 16:56
Your code as posted works fine for me, if I add a definition for `matrix`. My example is [here](http://pastebin.com/NxpMVwtK). If you're doing something like `std::vector > matrix;` that won't work. The input for `cudaMemcpy` needs to be a flat allocation. — Robert Crovella, Apr 13 '15 at 17:01
it is 1000 lines long! I cut it. I can't copy and paste the whole code! I ensure you that what it is written before works! — Domenico, Apr 13 '15 at 17:01
@domenico: we don't want the whole code. We want the shortest, compilable, runnable example which illustrates and reproduces your problem. If you can't provide that, we can't help you — talonmies, Apr 13 '15 at 17:06
@RobertCrovella mtrix variable is generated in this way: 'uint16_t **matrix = new uint16_t*[1024]; for(int h = 0; h < 1024; ++h) matrix[h] = new uint16_t[65536];' — Domenico, Apr 13 '15 at 17:06
I'm not asking for your whole code. I've already demonstrated that I can take what you've posted and make a complete program out of it with just a few more lines. Do something like that. Are you not understanding that your definition of `matrix` is important? You need to show that. Note that SO [expects](http://stackoverflow.com/help/on-topic) that you provide an [MCVE](http://stackoverflow.com/help/mcve). You have not provided an MCVE. Nobody wants to see *your whole code*. Just a short example that reproduces the issue. — Robert Crovella, Apr 13 '15 at 17:06
@domenico: that detail you posted in your comment is the *critical* piece of information to understand your problem. It should have been in your question from the beginning. — talonmies, Apr 13 '15 at 17:17

score 1 · Accepted Answer · answered Apr 13 '15 at 17:13

1

This cannot be used as the source of a single cudaMemcpy operation:

uint16_t **matrix = new uint16_t*[1024]; 
for(int h = 0; h < 1024; ++h) matrix[h] = new uint16_t[65536];

Each call to new in host code creates a separate allocation, and there is no guarantee that these will be contiguous or adjacent. Therefore we cannot pass a single pointer to cudaMemcpy2D and expect it to be able to discover where all the allocations are. cudaMemcpy2D expects a single, contiguous allocation.

Note that cudaMemcpy2D expects a single pointer (*) and you are passing a double pointer (**).

The simplest solution is to flatten your matrix like this:

uint16_t *matrix = new uint16_t[1024*65536];

and use index arithmetic for 2D access.

answered Apr 13 '15 at 17:13

Robert Crovella

143,785
11
213
257

We really ought to find one of the hundreds of these copy array of pointer questions, clean it up, and use it as the default duplicate close. This must come up at least once a week.... – talonmies Apr 13 '15 at 17:15
1

I've thought about that too (*many* times). Every time I think about it, I start to get lost in all the possible variations and recommendations, and I'm unable to come up with anything concise. – Robert Crovella Apr 13 '15 at 17:16
@talonmies you wrote a *superb* canonical error checking question. And in my opinion you wrote *the canonical* 2D array access answer [here](http://stackoverflow.com/questions/6137218/how-can-i-add-up-two-2d-pitched-arrays-using-nested-for-loops). (Perhaps I should have marked this as a dupe of that.) If you want to write a new question/answer I would upvote and I'm sure others would. I know it goes against your current standing, so if you wanted to edit one of your previous answers that would work too. I'm open to suggestions. – Robert Crovella Apr 13 '15 at 17:31

cudamalloc of 2D array

1 Answers1