5

Hey there, I have the following piece of code:

#if USE_CONST == 1
    __constant__ double PNT[ SIZE ];    
#else
    __device__ double *PNT;
#endif

and a bit later I have:

#if USE_CONST == 0
    cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
    cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
#else
    cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);
#endif

whereas point is somewhere defined in the code before. When working with USE_CONST=1 everything works as expected, but when working without it, than it doesn't. I access the array in my kernel-function via

PNT[ index ]

Where's the problem between the both variants? Thanks!

caf
  • 233,326
  • 40
  • 323
  • 462
tim
  • 9,896
  • 20
  • 81
  • 137

2 Answers2

3

The correct usage of cudaMemcpyToSymbol prior to CUDA 4.0 is:

cudaMemcpyToSymbol("PNT", point, sizeof(double)*SIZE)

or alternatively:

double *cpnt;
cudaGetSymbolAddress((void **)&cpnt, "PNT");
cudaMemcpy(cpnt, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);

which might be a bit faster if you are planning to access the symbol from the host API more than once.

EDIT: misunderstood the question. For the global memory version, do something similar to the second version for constant memory

double *gpnt;
cudaGetSymbolAddress((void **)&gpnt, "PNT");
cudaMemcpy(gpnt, point, sizeof(double)*SIZE.  cudaMemcpyHostToDevice););
talonmies
  • 70,661
  • 34
  • 192
  • 269
  • Okay, first thanks. The strange thing is that it works eben without the quotation... But what I asked for was rather the access when not having `USE_CONST=1`, which means that the block with the global memory is used... `__device__ double *PNT;` etc... Than it doesn't work correct :( Is there anything wrong about it? – tim May 17 '11 at 23:49
  • 5
    talonmies' answer for how to do it is correct (so I upvoted), but it lacks an explanation. The explanation is: `*PNT` is a `__device__` variable, not a host variable containing the address of a device variable. (Confusing, I know.) Therefore if you try to access it on the host as with `(void**)&PNT` you are trying to read a device variable from the host which is not permitted. From the host code point of view it's just a symbol, so you need to use `cudaGetSympolAddress()` to store the device address in a host variable that you can then pass to `cudaMemcpyToSymbol()`, as @talonmies shows. – harrism May 18 '11 at 03:20
  • Thanks, that's a good explanation :) And thanks @talonmies too :) – tim May 18 '11 at 09:52
  • 2
    For future readers: The string version is deprecated, that is why the version without the quotation marks works (and is prefered). The usage as shown here wont work correctly! The usage of `cudaMemcpyToSymbol` already uses `cudaGetSymbolAddress` so using it twice will break – Flamefire Dec 03 '15 at 10:31
2

Although this is an old question I add this for future googlers:

The problem is here:

cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);

The cudaMalloc writes to the host version of PNT which is actually a device variable that must not be accessed from host. So correct would be to allocate memory, copy the address to the device symbol and copy the memory to the the memory pointed to by that symbol:

void* memPtr;
cudaMalloc(&memPtr, sizeof(double)*SIZE);
cudaMemcpyToSymbol(PNT, &memPtr, sizeof(memPtr));
// In other places you'll need an additional:
// cudaMemcpyFromSymbol(&memPtr, PNT, sizeof(memPtr));
cudaMemcpy(memPtr, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);

Easier would be:

#if USE_CONST == 1
    __constant__ double PNT[ SIZE ];    
#else
    __device__ double PNT[ SIZE ];
#endif

// No #if required anymore:
cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);
Flamefire
  • 5,313
  • 3
  • 35
  • 70