0

I'm trying to initialize some structs on the device, but when they come back, theyre all messy. I know that something is wrong with how I'm using my pointers, but I cant seem to figure it out.

heres my memory allocation on the host

body* devBodies;
body** devBodyList;
float* devRandoms;
cudaMalloc((void**)&devBodies, n * sizeof(body));
cudaMalloc((void**)&devBodyList, n * sizeof(body*));
cudaMalloc((void**)&devRandoms, n * 3 * sizeof(float));

heres my function call, where devRandoms is just a list of floats.

CUDAInitBodies<<<n,1>>>(devBodyList, devBodies, devRandoms);

and heres my method:

__global__ void CUDAInitBodies(body** devBodyList, body* devBody, float* rand)
{
    int j;
    int tid = blockIdx.x;
    *(devBodyList[tid]) = devBody[tid];
    devBodyList[tid]->id = tid;
    devBodyList[tid]->m = 10;
    for(j = 0; j < 3; j++) 
    {
        devBodyList[tid]->a[j] = 0;
        devBodyList[tid]->v[j] = 0;
        devBodyList[tid]->pos[j] = rand[(tid * 3) + j];
    }   

}

when I then copy the data back to the host:

body* bodies = (body*)malloc(n * sizeof(body));
cudaMemcpy(bodies, devBodies, n * sizeof(body), cudaMemcpyDeviceToHost);

when I print out the variables of my "body", I get this:

====BODY[-581043205]====
    M = -42522218781525353518415985938704367616.000000
    V = [-311601248975690312470654313562112.000000, 17269896096570671104.000000, 307939529506715418513587721849804947456.000000]
    X = -19247336126697914498972549714433540096.000000
    Y = 17731266573644159438123340575306416128.000000
    Z = -544771403677696.000000

I've trying different ways of doing this for quite a while, but nothing seems to be doing the trick

Chris Phillips
  • 1,997
  • 2
  • 19
  • 34
  • It doesn't look like you've initialized the `devBodyList`pointer array. I can see that you've allocated storage for it, but I don't see where you've set each pointer to point to something (like a `body` struct). Try adding this line near the beginning of your kernel: `devBodyList[tid] = &(devBody[tid]);` And it's worth noting that passing data referenced by a double pointer (`**`) to or from a kernel is somewhat difficult. Search on "CUDA 2D Array". And if you ran your code with `cuda-memcheck` I think you might discover some out-of-bounds accesses due to the unitialized pointers. – Robert Crovella Apr 20 '14 at 00:04
  • Yeah, I initialized that variable. I just didnt put that in my post.I edited my post to be more clear. – Chris Phillips Apr 20 '14 at 00:10
  • Where did you initialize each pointer in the array? I understand that you've allocated storage for an array of pointers on the device, but before you can dereference one of them in your kernel, it must point to something valid (on the device). Did you try running your code with `cuda-memcheck` ? – Robert Crovella Apr 20 '14 at 00:19
  • *(devBodyList[tid]) = devBody[tid]; which happens inside of my kernel – Chris Phillips Apr 20 '14 at 00:28
  • 2
    That is *dereferencing* `devBodyList[tid]`. But where is `devBodyList[tid]` set to a proper value? You seem to be struggling with basic C pointer concepts, here. I assure you, based on what you've shown here, you are dereferencing an invalid pointer in your kernel. If you don't believe me, try running your code with `cuda-memcheck`. The invalid global write error that will result is due to `devBodyList[tid]` not being set to a correct value. – Robert Crovella Apr 20 '14 at 00:51

1 Answers1

1

The reason you are getting garbage output is that you are dereferencing an uninitialized pointer at this line of code in your kernel:

*(devBodyList[tid]) = devBody[tid];

On the host, you allocated storage for an array of pointers here:

cudaMalloc((void**)&devBodyList, n * sizeof(body*));

That only creates storage for the array of pointers. It does not set any of the pointers in the array to point to anything valid.

Each member of the array is a pointer to a body struct, using the first member as an example:

devBodyList[0]

But that pointer does not point to anything (valid) unless you initialize it, with a statement like (in device code):

devBodyList[0] = &(devBody[0]);

Now I can use devBodyList[0] as a pointer to the body structure allocated at devBody[0]. Note that devBodyList[0] is stored on the device, so to initialize it I must either:

  1. Initialize it in device code before using it (such as the above line of code).
  2. Initialize it on the host, but setting up a valid pointer and then using an operation like cudaMemcpy to copy that initialized value to the device storage that was allocted.

Unless you account for the above, when you go to use the pointer, it will contain a garbage value, and dereferencing it will produce invalid accesses. You can see evidence of these invalid accesses by running your code with cuda-memcheck. You'll get a message like "invalid global write..."

Here's a fully worked code, based on your pieces, that does not produce garbage results. (I'm not saying it's particularly sensible, since using devBodyList[x] to refer to devBody[x] seems unnecessary to me.) But it is legal code and will not produce any kernel errors:

#include <stdio.h>

struct body {
int id;
int m;
int a[3];
int v[3];
float pos[3];
};

__global__ void CUDAInitBodies(body** devBodyList, body* devBody, float* rand)
{
    int j;
    int tid = blockIdx.x;
    devBodyList[tid] = &(devBody[tid]);
    *(devBodyList[tid]) = devBody[tid];
    devBodyList[tid]->id = tid;
    devBodyList[tid]->m = 10;
    for(j = 0; j < 3; j++)
    {
        devBodyList[tid]->a[j] = 0;
        devBodyList[tid]->v[j] = 0;
        devBodyList[tid]->pos[j] = rand[(tid * 3) + j];
    }

}

int main(){
  int n = 1;
  body *devBodies;
  body **devBodyList;
  float *devRandoms;

  cudaMalloc((void**)&devBodies, n * sizeof(body));
  cudaMalloc((void**)&devBodyList, n * sizeof(body*));
  cudaMalloc((void**)&devRandoms, n * 3 * sizeof(float));

  CUDAInitBodies<<<n,1>>>(devBodyList, devBodies, devRandoms);

  body* bodies = (body*)malloc(n * sizeof(body));
  cudaMemcpy(bodies, devBodies, n * sizeof(body), cudaMemcpyDeviceToHost);

  printf("Body %d\n", bodies[0].id);
  printf("M : %d\n", bodies[0].m);
  printf("V : %d\n", bodies[0].v[0]);
  return 0;
}
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • I appreciate the in-depth explanation. I'm going back to basics to try and get this. can you look at this for me? http://stackoverflow.com/questions/23177802/cuda-headache-im-just-not-getting-it – Chris Phillips Apr 20 '14 at 02:36