0

I referred to this post: loading an array of structs with arrays onto cuda

In the code given in the above post, cpuPointArray (array of structs) is allocated memory on the CPU using malloc, whereas the struct members (float* c and float* d) are allocated memory using cudamalloc.

Can someone please explain why this is being done.

Also, I don't understand why in the following loop (from the above link), cpuPointArray is being used to copy the data from device to host. When I replaced cpuPointArray with gpuPointArray, I get segmentation fault. Using cuda-gdb, I found the gpuPointArray.c is NULL. Can someone please explain why should it be NULL:

for (int k=0; k<16; k++){
        printf("creating memory on cpu for array c\n");
        outPointArray[k].c = (float*)malloc(16*sizeof(float));
        printf("creating memory on cpu for array d\n");
        outPointArray[k].d = (float*)malloc(16*sizeof(float));
        printf("copying memory values onto cpu array c\n");
        err = cudaMemcpy(outPointArray[k].c, cpuPointArray[k].c, 16*sizeof(float), cudaMemcpyDeviceToHost);
        checkerror(err, "copy array c from gpu to cpu");
        printf("copying memory values onto cpu array c\n");
        err = cudaMemcpy(outPointArray[k].d, cpuPointArray[k].d, 16*sizeof(float), cudaMemcpyDeviceToHost);
        checkerror(err, "copy array d from gpu to cpu");
        printf("bottom of loop %d\n", k);
    }
Community
  • 1
  • 1
user1274878
  • 1,275
  • 4
  • 25
  • 56
  • 1
    Because if you `cudaMalloc` `cpuPointArray`, you won't be able to dereference it using `[]` operator in the host code, and access its members. With the current technology, you cannot directly access device memory from the host code. What the code in the link does is called deep copying. – Farzad Apr 26 '14 at 00:52
  • Any links explaining what is deep copying? Why not allocate the struct members on the host, and then copy to the GPU? What is the benefit of using cudamalloc for the struct members? – user1274878 Apr 26 '14 at 16:41
  • 1
    `cudaMalloc` is used on the struct members, so that when the struct is copied to the GPU, the struct members (pointers) point to valid allocations that can be used on the GPU. If you allocate the struct on the CPU, and allocate its pointer members using malloc, then when you copy the struct to the GPU, the pointers allocated using malloc will be pointing to host memory, which can't be accessed from GPU code. – Robert Crovella Apr 26 '14 at 17:06
  • 2
    Deep copying refers to copying a data structure (or array of structures) that contain embedded pointers. When we copy such a data structure on the host, the embedded pointers are still valid. When we copy such a structure to the device, the embedded pointers are no longer valid, and must be re-created using device allocations and their corresponding pointers. A more complete description is [here](http://stackoverflow.com/questions/15431365/cudamemcpy-segmentation-fault/15435592#15435592). – Robert Crovella Apr 26 '14 at 17:16

0 Answers0