Generating Uniform Double random numbers on device in CUDA

Question

I would like to generate uniform random numbers on the device, to be used inside of a device function. Each thread should generate a different uniform random number. I have this code, but I get a segmentation fault.

int main{
  curandStateMtgp32 *devMTGPStates;
  mtgp32_kernel_params *devKernelParams;

  cudaMalloc((void **)&devMTGPStates, NUM_THREADS*NUM_BLOCKS * sizeof(curandStateMtgp32));
  cudaMalloc((void**)&devKernelParams,sizeof(mtgp32_kernel_params));

  curandMakeMTGP32Constants(mtgp32dc_params_fast_11213, devKernelParams);
  curandMakeMTGP32KernelState(devMTGPStates,
    mtgp32dc_params_fast_11213, devKernelParams,NUM_BLOCKS*NUM_THREADS, 1234);

  doHenry <<NUM_BLOCKS,NUM_THREADS>>> (devMTGPStates);
}

Inside my global function doHenry, evaluated on the device, I put:

double rand1 = curand_uniform_double(&state[threadIdx.x+NUM_THREADS*blockIdx.x]);

Is this the best way to generate a random number per thread? I don't understand what the devKernelParams is doing, but I know I need one state per thread, right?

It's handy when posting questions like this if you identify for your readers which line is giving you the seg fault. And if you don't know - please do that level of trivial debugging. Also, you should be [checking your cuda API calls](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) and curand calls with error checking. Curand error checking is covered in the [documentation](http://docs.nvidia.com/cuda/curand/index.html#topic_1_2_3). Yes, if you want independent random sequences per thread, you need one state per thread. — Robert Crovella, Jun 23 '13 at 23:02

score 2 · Accepted Answer · answered Jun 24 '13 at 04:33

I think you're getting the seg fault on this line:

curandMakeMTGP32KernelState(devMTGPStates,  mtgp32dc_params_fast_11213, devKernelParams,NUM_BLOCKS*NUM_THREADS, 1234);

I believe the reason for the seg fault is because you have exceeded 200 for the n parameter, for which you are passing NUM_BLOCKS*NUM_THREADS. I tried a version of your code, and I was able to reproduce the seg fault at around n=540.

The MT generator has a limitation on the amount of states it can set up when using pre-generated kernel parameters (mtgp32dc_params_fast_11213). You may wish to read the relevant section of the documentation. (Bit Generation with the MTGP32 generator)

I'm not really an expert on CURAND, but other generators (such as XORWOW) don't have this type of limitation, so if you want to generate a large amount of independent thread state easily, consider one of the other generators. Using the particular approach you have outlined, the MTGP32 generator seems to be limited to about 200*256 independent thread generation. Contrary to what I said in the comments (which is true for other generator types) the MTGP32 state seems to be sufficient at one state for a block of up to 256 threads. And the example given in the documentation (refer to the second example) uses that type of state generation and threadblock hierarchy.

Thank you. I was thinking that I need one state per thread; but it seems that one state per block is sufficient for generating a different uniform number on each thread, independent of all of the other blocks, from the example you showed me. — Cokes, Jun 25 '13 at 17:48

Generating Uniform Double random numbers on device in CUDA

1 Answers1