0

Possible Duplicate:
Copying a struct containing pointers to CUDA device

I have a structure of device pointers, pointing to arrays allocated on the device. like this

struct mystruct{

int* dev1;
double* dev2;
.
.
}

There are a large number of arrays in this structure. I started writing a CUDA kernel in which I passed the pointer to mystruct and then derefernce it within the CUDA kernel code like this mystruct->dev1[i].

But I realized after writing a few lines that this will not work since by CUDA first principles you cannot derefernce a host pointer (in this case to mystruct) within a CUDA kernel.

But this is kind of inconveneint, since I will have to pass a larger number of arguments to my kernels. Is there any way to avoid this. I would like to keep the number of arguments to my kernel calls as short as possible.

Community
  • 1
  • 1
curiousexplorer
  • 1,217
  • 1
  • 17
  • 24

3 Answers3

2

As I explain in this answer, you can pass your struct by value to the kernel, so you don't have to worry about dereferencing a host pointer:

__global__ void kernel(mystruct in)
{
  int idx = threadIdx.x + blockIdx.x * blockDim.x;
  in.dev1[idx] *= 2;
  in.dev2[idx] += 3.14159;
}

There is the overhead of passing the struct by value to be aware of. However if your struct is not too large, it shouldn't matter.

If you pass the same struct to a lot of kernels, or repeatedly, you may consider copying the struct itself to global or constant memory instead as suggested by aland, or use mapped host memory as suggested by Mark Ebersole. But passing the struct by value is a much simpler way to get started.

(Note: please search StackOverflow before duplicating questions...)

Community
  • 1
  • 1
harrism
  • 26,505
  • 2
  • 57
  • 88
0

You can copy your mystruct structure to global memory and pass its device address to kernel.

From performance viewpoint, however, it would be better to store mystruct in constant memory, since (I guess) there are a lot of random reads from it by many threads.

aland
  • 4,829
  • 2
  • 24
  • 42
0

You could also use page-locked (pinned) host memory and create the structure within that region if your setup supports it. Please see 3.2.4 of the CUDA programming guide.

Mark Ebersole
  • 783
  • 1
  • 7
  • 9