Why is my CUDA kernel returning old values?

Question

Kind of almost at the point of ripping my hair out over this issue.

I have a CUDA kernel that does some math on data stored in a 3D array. While testing this, I used to assign some values (non-zero) to the array and observe results. I commented out those lines since, but the result is still the same. It is as if it is completely ignoring the fact that I'm doing a memset to 0.

The code works correctly when I step through it in Debug... But not in Release! My guess is I have a memory leak from this matrix.

I allocate this array as:

cudaExtent m_extent = make_cudaExtent(sizeof(float)*matdim.x, matdim.y, matdim.z); // width, height, depth
cudaPitchedPtr m_device;
cudaMalloc3D(&m_device, m_extent);
cudaMemset3D(m_device, 0, m_extent);

I call the kernel in a loop like this:

for (int iter = 0; iter < gpu_iterations; iter++)
    {
        PF_iteration_kernel<<<grids,threads>>>(m_device, m_extent, matdim);
        cudaDeviceSynchronize(); 
    }

After which I release the m_device pitched pointer:

cudaFree(m_device.ptr);

matdim is just matrix dimensions held by a dim3.

Within the kernel I do the following (well, I commented everything functional out...):

__global__ void PF_iteration_kernel(cudaPitchedPtr mPtr, cudaExtent mExt, dim3 matrix_dimensions)
{
int x = threadIdx.x + blockIdx.x * blockDim.x;
int y = threadIdx.y + blockIdx.y * blockDim.y;

// Find location within the pitched memory
char *m = (char*)mPtr.ptr;

int sof = sizeof(float);
size_t pitch = mPtr.pitch;
size_t slice_pitch = pitch*mExt.height;
char* m_addroff = m + y * pitch + x * sof;
printf("m(%d,%d) is %f \n", x, y, *m_addroff); // display the slice

*m_addroff = 0; // WILL THIS RESET IT?!

__syncthreads();
}

That should be just showing 0s, but it displays my old values (25, 26, 27, 28, etc).

I have cleaned and re-cleaned and re-built everything several times. I have relaunched the IDE.

My IDE is Visual Studio 2010 With NSight 4.6 (CUDA 7.0). I am on Windows 7 x64

Has this previous SO question any relevance? http://stackoverflow.com/questions/10611451/how-to-use-make-cudaextent-to-define-a-cudaextent-correctly — Weather Vane, Apr 24 '15 at 18:36
@WeatherVane, I don't think so. The accepted answer doesn't even release any memory that they allocated. All I'm doing is a Malloc and a Memset, but the person in the other question was asking about Memcpy as well. — Mewa, Apr 24 '15 at 18:43
In `printf("m(%d,%d) is %f \n", x, y, *m_addroff);` the compiler will surely see a `char` and promote it to `int` pushed on to stack - not a `float` promoted to `double` that the format requires? Because `char* m_addroff` is not a `float` and the compiler does not push args according to the format spec - although some compilers will warn of problems. — Weather Vane, Apr 24 '15 at 18:52
Interesting! I guess I gotta typecast `m_addroff` to float before attempting to printf() it. It must use some old values otherwise? I find it strange that it found some previous values to display anyways. But hey, if it works, it works. Thanks @WeatherVane :) If you want to post it as an answer, I'll accept it. (Wowzers...) — Mewa, Apr 24 '15 at 19:20

Weather Vane · Accepted Answer · 2015-04-24T19:42:17.850

Consider this

char* m_addroff = m + y * pitch + x * sof;
printf("m(%d,%d) is %f \n", x, y, *m_addroff);

The compiler will see a char and promote it to int pushed on to stack - not a float promoted to double that the format requires.

The compiler does not provide arguments to fit the format spec, but some compilers will examine the format specs and warn of problems.

I suggest you cast the argument. I risk guessing and failing, but something like this

printf("m(%d,%d) is %f \n", x, y, *(float*)m_addroff);

Herer is a simple example.

#include <stdio.h>
int main()
{
    char car [4] = {0};
    char *cptr = car;
    printf ("Hello %f\n", *(float*)cptr);
    return 0;
}

Why is my CUDA kernel returning old values?

1 Answers1