4

How will the cudaMemcpy function work in this case?

I have declared a matrix like this

float imagen[par->N][par->M];

and I want to copy it to the cuda device so I did this

float *imagen_cuda;

int tam_cuda=par->M*par->N*sizeof(float);

cudaMalloc((void**) &imagen_cuda,tam_cuda); 
cudaMemcpy(imagen_cuda,imagen,tam_cuda,cudaMemcpyHostToDevice);

Will this copy the 2d array into a 1d array fine?

And how can I copy to another 2d array? can I change this and will it work?

float **imagen_cuda;
Atirag
  • 1,660
  • 7
  • 32
  • 60
  • 1
    First, why are you using the stack for your 2D array? Is it always going to be small? Second, 2D arrays are stored in a contiguous memory block, so indeed you could use a single `cudaMemcpy()` to copy the data to a 1D device array. As for 2D CUDA arrays, it is a bit more complicated. You can find some information and examples on Stack Overflow (e.g. [here](http://stackoverflow.com/a/9974989/1043187)). – BenC May 17 '13 at 01:43
  • Also, are your N and M known during compilation? – BenC May 17 '13 at 01:51
  • Yes they are known and the 2d array could be like 1024*1024 or bigger – Atirag May 17 '13 at 15:58
  • Thanks for the link I'll check it out – Atirag May 17 '13 at 15:59
  • Sorry my mistake M and N are not known at the time of compilation. They depend on the image being loaded into the program. width and height – Atirag May 17 '13 at 16:05
  • I read some of the stuff on the link along with the programming guide. If I understand correctly since my image representation are floats then the data alignment is done automatically "The alignment requirement is automatically fulfilled for the built-in types of char, short, int, long, longlong, float, double like float2 or float4.". So a 1d array will work fine right? or is it better still to arrange it in a 2d array? – Atirag May 17 '13 at 17:54
  • 1
    look at cudaMemcpy2D function – T_T May 17 '13 at 18:07

1 Answers1

5

It's not trivial to handle a doubly-subscripted C array when copying data between host and device. For the most part, cudaMemcpy (including cudaMemcpy2D) expect an ordinary pointer for source and destination, not a pointer-to-pointer.

The simplest approach (I think) is to "flatten" the 2D arrays, both on host and device, and use index arithmetic to simulate 2D coordinates:

float imagen[par->N][par->M];
float *myimagen = &(imagen[0][0]);
float myval = myimagen[(rowsize*row) + col];

You can then use ordinary cudaMemcpy operations to handle the transfers (using the myimagen pointer):

float *d_myimagen;
cudaMalloc((void **)&d_myimagen, (par->N * par->M)*sizeof(float));
cudaMemcpy(d_myimagen, myimagen, (par->N * par->M)*sizeof(float), cudaMemcpyHostToDevice);

If you really want to handle dynamically sized (i.e. not known at compile time) doubly-subscripted arrays, you can review this question/answer.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • No, I think I can handle the 1d array using index arithmetic just fine since the memory allocation will make no difference. So thanks! – Atirag May 17 '13 at 21:53
  • cudaMalloc takes a double pointer. It should be `cudaMalloc((void **)&d_myimagen ...` – VforVitamin Dec 19 '18 at 23:38