2

In my current project, a call to cudaGetLastError() is returning unknown error and I don't know why. The code compiles just fine, but it is not behaving how I would like it to.

Below is a brief, not compilable example of what the relevant code consists of:

CU_Main.cu

Below is the CUDA kernel:

//My CUDA kernel
__global__ void CU_KernelTest(Kernel* matrix){
    int idx = blockIdx.x * blockDim.x + threadIdx.x;  
    int idy = blockIdx.y * blockDim.y + threadIdx.y;  

    if(idx == 0 && idy == 0){
            printf("ID is: %d\n", idx);
            matrix->set(1,1, 16.0f);
    }
}

Here is the host code:

//A host function which is called when a button is clicked
int HOST_OnbuttonClick(){
    Kernel* matrix = new Kernel(3,3,2);
    Kernel* device_matrix;

    cudaMalloc(&device_matrix, sizeof(Kernel));
    cudaMemcpy(device_matrix, matrix, sizeof(Kernel), cudaMemcpyHostToDevice);

    CU_KernelTest<<<256, 256>>>(device_matrix);
    cudaDeviceSynchronize();

    cudaError_t err = cudaGetLastError();
    if (err != cudaSuccess) {
        printf("Error: %s\n", cudaGetErrorString(err));
    }


    cudaFree(device_matrix);
    return 0.0f;
}

When matrix->set(1,1, 16.0f); is included in the cuda kernel, (err != cudaSuccess) returns true and prints out UNKNOWN ERROR, whereas if I comment set out, i get no error.

The other struct relevant to this is my own helper for a convolution kernel design I'm going for, naturally called Kernel.

Kernel.cuh

struct Kernel {
    private :
        float* kernel;
        int rows;
        int columns;

    public :

        __device__ __host__
        Kernel(int _rows, int _columns, float _default) {
            rows = _rows;
            columns = _columns;
            kernel = new float[rows * columns];

            for(int r = 0; r < rows; r++){
                for(int c = 0; c < columns; c++){
                    kernel[r * rows + c] = _default;
                }
            }
        }

        __device__ __host__
        void set(int row, int col, float value){
            kernel[row * rows + col] = value;
        }
}

The goal of this design is to be able to set all values for the kernel on the host, send it to the CUDA kernel, set values there and then retrieve the updated object back at the host.

So, there are two issues really, why would I get an unknown error message, and is the code syntactically correct that it should work? Let me know if more information is needed.

Here are the results of the memory checker:

Nsight Debug
================================================================================
CUDA Memory Checker detected 1 threads caused an access violation:
Launch Parameters
    CUcontext    = 071c7340
    CUstream     = 08f3e3b8
    CUmodule     = 08fa97a8
    CUfunction   = 08fdbbe8
    FunctionName = _Z13CU_KernelTestP6Kernel
    gridDim      = {1,1,1}
    blockDim     = {256,1,1}
    sharedSize   = 128
    Parameters:
        matrix = 0x06b60000  {kernel = 0x07a31718  ???, rows = 3, columns = 3}
    Parameters (raw):
         0x06b60000
GPU State:
   Address  Size      Type  Mem       Block  Thread         blockIdx  threadIdx      PC  Source
-----------------------------------------------------------------------------------------------
  07a31728     4    adr st    g           0       0          {0,0,0}    {0,0,0}  000260  c:\users

Summary of access violations:
c:\users....kernel.cuh(26): error MemoryChecker: #misaligned=0  #invalidAddress=2
BlackBox
  • 2,223
  • 1
  • 21
  • 37
  • 1
    Have you run the program under cuda-memcheck? – tera Apr 16 '13 at 19:38
  • I have not. How would that be done? – BlackBox Apr 16 '13 at 19:44
  • 1
    On the command line just prepend the command with `cuda-memcheck `. More info [here](https://developer.nvidia.com/cuda-memcheck). – tera Apr 16 '13 at 19:52
  • I'm running it from visual studio 2010 with NSight, there is an enable memory checker button in the dropdown but it just gets highlighted. From the debugger all that is shown is - `this 0x06b60000 {kernel = 0x07a31718 ???, rows = 3, columns = 3} __device__ Kernel* const` aswell as ` row 'row' has no value at the target location.` – BlackBox Apr 16 '13 at 20:03
  • I've updated the original post with the results – BlackBox Apr 16 '13 at 20:06
  • 1
    Your `Kernel` class contains a pointer. When you copy the class to the device, you have a host pointer on the device. Dereferencing that on the device gives you this invalid address access violation. – tera Apr 16 '13 at 20:31
  • 1
    This seems to be a regular cause for confusion. Robert Crovella has just [explained it](http://stackoverflow.com/a/16024373/1662425) yesterday. – tera Apr 16 '13 at 20:32
  • Interesting, but also very ugly ;p Thanks. Feel free to just put that as an answer so I can credit you for your help :) – BlackBox Apr 16 '13 at 20:53

1 Answers1

2

Your Kernel class contains a pointer. When you copy the class to the device, you have a host pointer on the device. Dereferencing that on the device gives you this invalid address access violation.

This seems to be a regular cause for confusion. Robert Crovella has just explained it yesterday.

Community
  • 1
  • 1
tera
  • 7,080
  • 1
  • 21
  • 32