0

I'm writing a CUDA application that has a step where the variance of some complex-valued input data is computed, and then that variance is used to threshold the data. I've got a reduction kernel that computes the variance for me, but I'm not sure if I have to pull the value back to the host to pass it to the thresholding kernel or not.

Is there a way to pass the value directly from device memory?

gct
  • 14,100
  • 15
  • 68
  • 107

1 Answers1

4

You can use a __device__ variable to hold the variance value in-between kernel calls.

Put this before the definition of the kernels that use it:

__device__ float my_variance = 0.0f;

Variables defined this way can be used by any kernel executing on the device (without requiring that they be explicitly passed as a kernel function parameter) and persist for the lifetime of the context, i.e. beyond the lifetime of any single kernel call.

It's not entirely clear from your question, but you can also define an array of data this way.

__device__ float my_variance[32] = {0.0f};

Likewise, allocations created by cudaMalloc live for the duration of the application/context (or until an appropriate cudaFree is encountered) and so there is no need to "pull back the data" to the host if you want to use it in a successive kernel:

float *d_variance;
cudaMalloc((void **)&d_variance), sizeof(float));
my_reduction_kernel<<<...>>>(..., d_variance, ...);
my_thresholding_kernel<<<...>>>(..., d_variance, ...);

Any value set in *d_variance by the reduction kernel above will be properly observed by the thresholding kernel.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • This seems to answer what I wanted. I knew cudaMalloc could do it but I didn't want to pass a pointer for a single value. By passing it by value to the kernel it'll get put in shared memory automatically for me I believe. – gct Oct 04 '13 at 16:43
  • a __device__ var can't be assigned using `var=...;` after initialization, right? – zhangxaochen Oct 11 '15 at 20:35
  • Yes, it can. Assuming you haven't marked it `const`. – Robert Crovella Oct 11 '15 at 20:36
  • I'm resetting a `__device__` var to zero (`var=0;`) and following some calculations in kernel function, why the resetting doesn't take effect? – zhangxaochen Oct 11 '15 at 21:01
  • You can't do `var=0;` in host code. You have to do it in device code, for a `__device__` variable. Or else you have to set var to zero using a `cudaMemcpyToSymbol` operation. – Robert Crovella Oct 11 '15 at 21:23
  • Replace that `var = 0;` line with `int temp_var = 0; cudaMemcpyToSymbol(var, &temp_var, sizeof(int));` – Robert Crovella Oct 11 '15 at 21:29
  • Great!!! I have tried `cudaMemcpyFromSymbol` before, yet I used `&var` instead, which caused `invalid device symbol` and made me give up that way... – zhangxaochen Oct 11 '15 at 21:33
  • For others info., this describes exactly what I was doing wrong: http://stackoverflow.com/questions/26075972/cudamemcpyfromsymbol-on-a-device-variable – zhangxaochen Oct 11 '15 at 21:36
  • btw, is there any good chatting room for cuda (e.g., irc) where the cuda guys being active in? I see #cuda @freenode, while that channel is not quite active (I may delete this comment latter) – zhangxaochen Oct 11 '15 at 21:40