create a class object inside a global kernel using new and double pointer

Question

I have the following global kernel:

__global__ void pdegpu(PDE_ParabolicD1_Num_GPU **pdes)
{    
    PDE_ParabolicD1_Num_GPU *loc;
    loc = new PDE_ParabolicD1_Num_GPU();
    loc->Setup();
    delete loc;
    //above code was just an example to show that new and delete work fine

    *pdes = new PDE_ParabolicD1_Num_GPU();   //error occurs here
    (*pdes)->Setup();
}

which I call to create an object of type PDE_ParabolicD1_Num_GPU and setup it. In main(), I will be using the same object that is why I am using double pointer in the function argument. In main(), I do the following:

PDE_ParabolicD1_Num_GPU pdes_host;
PDE_ParabolicD1_Num_GPU *pdes_dev=0;
pdegpu<<<1,1>>>(&pdes_dev);
cudaStatus = cudaMemcpy(&pdes_host, pdes_dev, sizeof(PDE_ParabolicD1_Num_GPU), cudaMemcpyDeviceToHost);
...
delete [] pdes_dev;

However, I get an error shown in the code, and CUDA Memory Checker output for the error is as follows:

Memory Checker detected 1 access violations.
error = access violation on store (global memory)
gridid = 16
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x0018f420
accessSize = 4
error MemoryChecker: #misaligned=0  #invalidAddress=1

As far as I understood the error is caused due to invalidAddress.
Could anyone help me to resolve the problem? Thank you

It would probably help to see your kernel invocation as well as the definition of the pointer you are passing to the kernel in **pdes. I'm assuming you cudaMalloc'ed this pointer and I'd like to see how you did it exactly. I'm also a little puzzled as to what you are trying to accomplish exactly. if you are wanting to pass *pdes back with allocated storage for use on the host, I don't think it's going to work the way you intended. Better to cudaMalloc space for the object storage on the host, then initialize it on the device. — Robert Crovella, Feb 26 '13 at 19:29
Looks like pdes is a host memory... Won't the result be the same if you don't do delete and assign *pdes = loc? — Eugene, Feb 26 '13 at 19:45
@Eugene: I tried *pdes=loc but got the same access violation error. — Meriko, Feb 26 '13 at 20:41
@RobertCrovella: I posted the kernel invocation. I don't cudaMalloc the pointer, because I want to allocate space and call the constructor inside the kernel, as well as I want to call other member functions, which I didn't show in the code. I just want to copy the pdes back in order to get the results after doing some computation on pdes. — Meriko, Feb 26 '13 at 20:53

score 1 · Answer 1 · answered Feb 26 '13 at 21:12

1

You should allocate the memory via cudaMalloc first. Currently your device code is trying to write host memory.

answered Feb 26 '13 at 21:12

Eugene

9,242
2
30
29

score 1 · Answer 2 · edited May 23 '17 at 12:14

pdes_dev is a host pointer. Device code cannot access host pointers. If you want to create a pointer that is modifiable on the device, do something like:

PDE_ParabolicD1_Num_GPU **pdes_dev=0;
cudaMalloc((void ***) &(pdes_dev), sizeof(PDE_ParabolicD1_Num_GPU *));
pdegpu<<<1,1>>>(pdes_dev);

Since, allocating using in-kernel new creates a pointer to the device heap, such pointer cannot be used in a cudaMemcpy operation to get the data stored there back to the host. You would then have to do a device-to-device copy to move the kernel-modified data into a buffer suitable for transfer back to the host. Therefore, a better approach might be to allocate device storage space for the object from the host:

PDE_ParabolicD1_Num_GPU *pdes_dev=0;
cudaMalloc((void **) &(pdes_dev), sizeof(PDE_ParabolicD1_Num_GPU));
pdegpu<<<1,1>>>(pdes_dev);

And your kernel code would have to be changed accordingly:

__global__ void pdegpu(PDE_ParabolicD1_Num_GPU *pdes)
{    

    pdes->Setup();
}

Note that I have not compiled and tested this code so there could be errors, but this outlines the approach that I think will work. You may also be interested in my answer to this question

create a class object inside a global kernel using new and double pointer

2 Answers2