Passing a struct pointer to a CUDA kernel

Question

Possible Duplicate:
Copying a struct containing pointers to CUDA device

I have a structure of device pointers, pointing to arrays allocated on the device. like this

struct mystruct{

int* dev1;
double* dev2;
.
.
}

There are a large number of arrays in this structure. I started writing a CUDA kernel in which I passed the pointer to mystruct and then derefernce it within the CUDA kernel code like this mystruct->dev1[i].

But I realized after writing a few lines that this will not work since by CUDA first principles you cannot derefernce a host pointer (in this case to mystruct) within a CUDA kernel.

But this is kind of inconveneint, since I will have to pass a larger number of arguments to my kernels. Is there any way to avoid this. I would like to keep the number of arguments to my kernel calls as short as possible.

score 2 · Answer 1 · edited May 23 '17 at 11:55

As I explain in this answer, you can pass your struct by value to the kernel, so you don't have to worry about dereferencing a host pointer:

__global__ void kernel(mystruct in)
{
  int idx = threadIdx.x + blockIdx.x * blockDim.x;
  in.dev1[idx] *= 2;
  in.dev2[idx] += 3.14159;
}

There is the overhead of passing the struct by value to be aware of. However if your struct is not too large, it shouldn't matter.

If you pass the same struct to a lot of kernels, or repeatedly, you may consider copying the struct itself to global or constant memory instead as suggested by aland, or use mapped host memory as suggested by Mark Ebersole. But passing the struct by value is a much simpler way to get started.

(Note: please search StackOverflow before duplicating questions...)

score 0 · Answer 2 · answered Sep 12 '12 at 19:08

You can copy your mystruct structure to global memory and pass its device address to kernel.

From performance viewpoint, however, it would be better to store mystruct in constant memory, since (I guess) there are a lot of random reads from it by many threads.

score 0 · Answer 3 · answered Sep 12 '12 at 21:04

0

You could also use page-locked (pinned) host memory and create the structure within that region if your setup supports it. Please see 3.2.4 of the CUDA programming guide.

answered Sep 12 '12 at 21:04

Mark Ebersole

783
1
7
9

1

You mean mapped host memory, Mark. Page-locked host memory is not necessarily mapped. – harrism Sep 13 '12 at 02:50

Passing a struct pointer to a CUDA kernel

3 Answers3