0

I have the following global kernel:

__global__ void pdegpu(PDE_ParabolicD1_Num_GPU **pdes)
{    
    PDE_ParabolicD1_Num_GPU *loc;
    loc = new PDE_ParabolicD1_Num_GPU();
    loc->Setup();
    delete loc;
    //above code was just an example to show that new and delete work fine

    *pdes = new PDE_ParabolicD1_Num_GPU();   //error occurs here
    (*pdes)->Setup();
}

which I call to create an object of type PDE_ParabolicD1_Num_GPU and setup it. In main(), I will be using the same object that is why I am using double pointer in the function argument. In main(), I do the following:

PDE_ParabolicD1_Num_GPU pdes_host;
PDE_ParabolicD1_Num_GPU *pdes_dev=0;
pdegpu<<<1,1>>>(&pdes_dev);
cudaStatus = cudaMemcpy(&pdes_host, pdes_dev, sizeof(PDE_ParabolicD1_Num_GPU), cudaMemcpyDeviceToHost);
...
delete [] pdes_dev;

However, I get an error shown in the code, and CUDA Memory Checker output for the error is as follows:

Memory Checker detected 1 access violations.
error = access violation on store (global memory)
gridid = 16
blockIdx = {0,0,0}
threadIdx = {0,0,0}
address = 0x0018f420
accessSize = 4
error MemoryChecker: #misaligned=0  #invalidAddress=1

As far as I understood the error is caused due to invalidAddress.
Could anyone help me to resolve the problem? Thank you

Meriko
  • 161
  • 2
  • 11
  • It would probably help to see your kernel invocation as well as the definition of the pointer you are passing to the kernel in **pdes. I'm assuming you cudaMalloc'ed this pointer and I'd like to see how you did it exactly. I'm also a little puzzled as to what you are trying to accomplish exactly. if you are wanting to pass *pdes back with allocated storage for use on the host, I don't think it's going to work the way you intended. Better to cudaMalloc space for the object storage on the host, then initialize it on the device. – Robert Crovella Feb 26 '13 at 19:29
  • Looks like pdes is a host memory... Won't the result be the same if you don't do delete and assign *pdes = loc? – Eugene Feb 26 '13 at 19:45
  • @Eugene: I tried *pdes=loc but got the same access violation error. – Meriko Feb 26 '13 at 20:41
  • @RobertCrovella: I posted the kernel invocation. I don't cudaMalloc the pointer, because I want to allocate space and call the constructor inside the kernel, as well as I want to call other member functions, which I didn't show in the code. I just want to copy the pdes back in order to get the results after doing some computation on pdes. – Meriko Feb 26 '13 at 20:53

2 Answers2

1

You should allocate the memory via cudaMalloc first. Currently your device code is trying to write host memory.

Eugene
  • 9,242
  • 2
  • 30
  • 29
1

pdes_dev is a host pointer. Device code cannot access host pointers. If you want to create a pointer that is modifiable on the device, do something like:

PDE_ParabolicD1_Num_GPU **pdes_dev=0;
cudaMalloc((void ***) &(pdes_dev), sizeof(PDE_ParabolicD1_Num_GPU *));
pdegpu<<<1,1>>>(pdes_dev);

Since, allocating using in-kernel new creates a pointer to the device heap, such pointer cannot be used in a cudaMemcpy operation to get the data stored there back to the host. You would then have to do a device-to-device copy to move the kernel-modified data into a buffer suitable for transfer back to the host. Therefore, a better approach might be to allocate device storage space for the object from the host:

PDE_ParabolicD1_Num_GPU *pdes_dev=0;
cudaMalloc((void **) &(pdes_dev), sizeof(PDE_ParabolicD1_Num_GPU));
pdegpu<<<1,1>>>(pdes_dev);

And your kernel code would have to be changed accordingly:

__global__ void pdegpu(PDE_ParabolicD1_Num_GPU *pdes)
{    

    pdes->Setup();
}

Note that I have not compiled and tested this code so there could be errors, but this outlines the approach that I think will work. You may also be interested in my answer to this question

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257