0

I am new to CUDA and and I was wondering if I could do something like this:

__global__ void MCkernel ( curandState* globalState, int* jumpGPU, int* nghPtrGPU, \
                      int* nghOffset)

{
    // get idx
    int idx = (threadIdx.x+blockDim.x*blockIdx.x);

    // set up curand and generated state for each thread...
    curandState localState = globalState[idx];
    float randP = curand_uniform( &localState );//the random number (0,1)
    globalState[idx] = localState;

    // assume ranges vary by thread index
    int ptr2ngh=jumpGPU[idx]; 
    int min = (int)nghPtrGPU[ptr2ngh];
    int max = (int)nghPtrGPU[ptr2ngh+1];

    nghOffset[idx] = min + (int)truncf(randP *(max - min-1) \
            + min+0.5f );
}

where I use Jump[idx] value to access nghPtrGPU i.e. nghPtrGPU[Jump[idx]] If so, what am I doing wrong here? the above kernel outputs the correct randP,ptr2ngh but not the correct nghOffset array. Any help would be appreciated ~ Thanks!

Sample Output:

idx 0:  randP:0.200745,ptr2ngh:25 --> nghOffset -2031558532.
idx 1:  randP:0.288867,ptr2ngh:5 --> nghOffset -2029677060.
idx 2:  randP:0.526483,ptr2ngh:32 --> nghOffset -2024603396.
idx 3:  randP:0.922736,ptr2ngh:50 --> nghOffset -2016142724.
idx 4:  randP:0.345037,ptr2ngh:25 --> nghOffset -2028477700.
idx 5:  randP:0.943210,ptr2ngh:25 --> nghOffset -2015705476.
idx 6:  randP:0.759569,ptr2ngh:14 --> nghOffset -2019626628.
idx 7:  randP:0.995884,ptr2ngh:2 --> nghOffset -2014580868.
idx 8:  randP:0.529909,ptr2ngh:9 --> nghOffset -2024530308.
idx 9:  randP:0.238731,ptr2ngh:64 --> nghOffset -2030747524.

Solved:: The memory allocation of nghOffset to the device had a rookie mistake which I debugged and it worked great. Will do better a better job explaining the question(s) I need answered.

PhdGirl
  • 3
  • 3
  • 2
    It's not possible to answer, without that you have shown the code where you allocate and fill the arrays! – hubs Mar 03 '14 at 21:27
  • 1
    What are some example values of `min` and `max` and `randP` and the corresponding value of `nghOffset[idx]` that you think is incorrect? – Robert Crovella Mar 03 '14 at 21:27
  • 1
    It's not clear to me what your question is. "Wondering if I can do something like this" followed by some code followed by "what am I doing wrong?" is vague about what the problem is. What exactly are you trying to accomplish, specifically what results are you getting, and how do they differ from the desired results (provide an example rather than just a description)? – Adi Inbar Mar 03 '14 at 21:44
  • My kernel seems to work until I use try to use --> nghPtrGPU[Jump[idx]]. Is there a reason this wouldn't work within a kernel? Perhaps, the sample output makes this question a lot more clearer. – PhdGirl Mar 03 '14 at 22:02
  • 3
    I'm not sure how the sample output helps. None of us know what `nghPtrGPU[]` contains, so telling us the offset into it isn't enough info. Did you notice I asked for `min` and `max`? Why not print those out as well? Are you doing [proper cuda error checking](http://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api) on your code (kernel calls and cuda API calls)? How about running your code with `cuda-memcheck` ? – Robert Crovella Mar 03 '14 at 22:08
  • on a cc2.0 or newer device, you can put `printf` statements directly in the kernel code, for debugging purposes. There's no problem in CUDA with using one array element as the index into another array. CUDA mostly adheres to C/C++, where this is perfectly legal. – Robert Crovella Mar 03 '14 at 22:19
  • @Robert Crovella: Will try with cuda-memcheck and revert back with any errors. Thanks for the tip. – PhdGirl Mar 03 '14 at 23:37
  • I didn't want the code debugged, we novices learn by doing; just wanted clarification if I was performing an illegal access by calling an array[array2[idx]] and if legal was I doing it the correct way. @Robert Crovella: Thanks for clarification && the tip to use cuda-memcheck , which was very useful. – PhdGirl Mar 05 '14 at 02:11
  • @Adi inbar, hubs : will do better with my questions next time around :) Thanks for chipping in. – PhdGirl Mar 05 '14 at 02:14

1 Answers1

0

Yes, it's legal to use one array member as the index into another array, such as:

int min = nghPtrGPU[jumpGPU[idx]];

This is legal in C/C++, and it's legal in CUDA C/C++ as well. Obviously this says nothing about indexing out-of-bounds, etc. The value returned by jumpGPU[idx] must be a valid index into the array nghPtrGPU[]. That's also no different than C/C++.

Probably the most common problems you might run into with the above construct would be either:

  1. jumpGPU[idx] does not represent a valid array index into nghPtrGPU[]
  2. either the pointers jumpGPU or nghPtrGPU have not been properly allocated on the device

Either of the above types of errors can be spotted using the cuda-memcheck tool. And anytime you're having trouble with cuda code, it's always a good idea to do proper cuda error checking.

For the most part, CUDA C/C++ adheres to C/C++ language rules, with some restrictions.

In fact, CUDA (C/C++) implements mainly a C++ compiler with GPU-specific extensions. However, since there is much commonality between C and C++, we can generalize in many cases to call it a C/C++ style language.

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257