I am new to CUDA and and I was wondering if I could do something like this:
__global__ void MCkernel ( curandState* globalState, int* jumpGPU, int* nghPtrGPU, \
int* nghOffset)
{
// get idx
int idx = (threadIdx.x+blockDim.x*blockIdx.x);
// set up curand and generated state for each thread...
curandState localState = globalState[idx];
float randP = curand_uniform( &localState );//the random number (0,1)
globalState[idx] = localState;
// assume ranges vary by thread index
int ptr2ngh=jumpGPU[idx];
int min = (int)nghPtrGPU[ptr2ngh];
int max = (int)nghPtrGPU[ptr2ngh+1];
nghOffset[idx] = min + (int)truncf(randP *(max - min-1) \
+ min+0.5f );
}
where I use Jump[idx] value to access nghPtrGPU i.e. nghPtrGPU[Jump[idx]] If so, what am I doing wrong here? the above kernel outputs the correct randP,ptr2ngh but not the correct nghOffset array. Any help would be appreciated ~ Thanks!
Sample Output:
idx 0: randP:0.200745,ptr2ngh:25 --> nghOffset -2031558532.
idx 1: randP:0.288867,ptr2ngh:5 --> nghOffset -2029677060.
idx 2: randP:0.526483,ptr2ngh:32 --> nghOffset -2024603396.
idx 3: randP:0.922736,ptr2ngh:50 --> nghOffset -2016142724.
idx 4: randP:0.345037,ptr2ngh:25 --> nghOffset -2028477700.
idx 5: randP:0.943210,ptr2ngh:25 --> nghOffset -2015705476.
idx 6: randP:0.759569,ptr2ngh:14 --> nghOffset -2019626628.
idx 7: randP:0.995884,ptr2ngh:2 --> nghOffset -2014580868.
idx 8: randP:0.529909,ptr2ngh:9 --> nghOffset -2024530308.
idx 9: randP:0.238731,ptr2ngh:64 --> nghOffset -2030747524.
Solved:: The memory allocation of nghOffset to the device had a rookie mistake which I debugged and it worked great. Will do better a better job explaining the question(s) I need answered.