nVidia CUDA code doesn't work?

Question

I am trying to klearn how to program for nVidia cards. Here is my code:

__global__ void add_one(int* i)
{
    i[0]++;
}

template<class TYPE>
void gpu_load(TYPE data)
{
    int size = 1;
    cudaMalloc( (void**) &data, size * sizeof(TYPE));
}

template<class TYPE>
void copy_to_gpu(TYPE* cpu_var, TYPE* gpu_var)
{
    int size = 1;
    cudaMemcpy(  gpu_var, cpu_var, size * sizeof(TYPE), cudaMemcpyHostToDevice); 
}

template<class TYPE>
void copy_to_cpu(TYPE* cpu_var, TYPE* gpu_var)
{
    int size = 1;
    cudaMemcpy( gpu_var, cpu_var, size * sizeof(TYPE), cudaMemcpyDeviceToHost);
}

int main() 
{
    int gpu_i[1];
    int cpu_i[1];

    cpu_i[0] = 5;

    gpu_load(cpu_i);
    copy_to_gpu(cpu_i, gpu_i);

    add_one<<<1, 1>>>(gpu_i);

    int res[1];

    copy_to_cpu(res, gpu_i);

    std::cout << res[0];
}

Why the cout doesn't display 5+1 and displays 0 instead?

I tried my best to make it work... It seems like nothing happens...?

score 0 · Answer 1 · answered Dec 06 '13 at 23:22

0

The first parameter to cudaMemcpy is always the destination, but in copy_to_cpu you pass gpu_var first.

answered Dec 06 '13 at 23:22

Alan Stokes

18,815
3
45
64

Thanks, now I call it: cudaMemcpy(cpu_var, gpu_var, size * sizeof(TYPE), cudaMemcpyDeviceToHost); and still returns 0? – SkyRipper Dec 06 '13 at 23:27
In `gpu_load` you are passing in `data`, but `cudaMalloc` writes to it and that result is then lost. – Alan Stokes Dec 06 '13 at 23:33
Try making it work without any of your helper functions, which are just making it harder. Once it works, then make it elegant. – Alan Stokes Dec 06 '13 at 23:35
Thanks, how can I get the type of a template, sizeof(TYPE) returns 8 bytes, when using it with a float. – SkyRipper Dec 06 '13 at 23:44
Start by getting rid of your helper functions. They are just making it harder. – Alan Stokes Dec 07 '13 at 10:04

score 0 · Accepted Answer · edited May 23 '17 at 11:49

You're passing cpu_i to your cudaMalloc routine. This is not what you want.
The gpu_i pointer needs to be something that is modifiable by your cudaMalloc routine, so we need to pass the address of it to that routine, as a pointer.
You had your parameters reversed on the copy_to_cpu routine.

If the following code doesn't work for you, add proper cuda error checking. It's possible there is a problem with your system config as well:

#include <iostream>

__global__ void add_one(int* i)
{
    i[0]++;
}

template<class TYPE>
void gpu_load(TYPE* &data)
{
    int size = 1;
    cudaMalloc( (void**) &data, size * sizeof(TYPE));
}

template<class TYPE>
void copy_to_gpu(TYPE* cpu_var, TYPE* gpu_var)
{
    int size = 1;
    cudaMemcpy(  gpu_var, cpu_var, size * sizeof(TYPE), cudaMemcpyHostToDevice);
}

template<class TYPE>
void copy_to_cpu(TYPE* cpu_var, TYPE* gpu_var)
{
    int size = 1;
    cudaMemcpy( cpu_var, gpu_var, size * sizeof(TYPE), cudaMemcpyDeviceToHost);
}

int main()
{
    int *gpu_i;
    int cpu_i[1];

    cpu_i[0] = 5;

    gpu_load(gpu_i);
    copy_to_gpu(cpu_i, gpu_i);

    add_one<<<1, 1>>>(gpu_i);

    int res[1];

    copy_to_cpu(res, gpu_i);

    std::cout << res[0];
}

I've modified the code to an equivalent form which does not require the `&` on the `gpu_load` parameter. — Robert Crovella, Dec 07 '13 at 02:18
Hah, filling up to the required number of characters with exclamation marks :) — Roger Dahl, Dec 07 '13 at 19:04

nVidia CUDA code doesn't work?

2 Answers2