-1

I got two functions: The add_cpu function works fine, but the add_gpu function does not.

I tried to check sum options on my GPU driver Software and read my code over and over again. I tried the exact same code on an other machine and it worked fine. The checkError result on current machine is 1, what it shouldn't be. And checkError result on my Laptop is 0, what is correct. Does anyone have any suggestion of what is the problem with the graphic card or the system? I have no clue what's the problem here. Did I miss some sort of option?

#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#include <iostream>
#include <math.h>

#define out std::cout <<
#define end << std::endl

__global__
void add_gpu( int n, float* x, float* y ) {
    for ( int i = 0; i < n; i++ ) y[i] = x[i] + y[i];
}

void add_cpu( int n, float* x, float* y ) {
    for ( int i = 0; i < n; i++ ) y[i] = x[i] + y[i];
}

void init( int n, float* x, float* y ) {
    for ( int i = 0; i < n; i++ ) {
        x[i] = 1.0f;
        y[i] = 2.0f;
    }
}

int checkError( int n, float f, float* y ) {
    float c = 0.0f;
    for ( int i = 0; i < n; i++ ) c = fmax( c, fabs( y[i] - f ) );
    return c;
}

void print( int n, float* obj, char* str = "obj: " ) {
    out str << obj[0];
    for ( int i = 1; i < n; i++ ) out ", " << obj[i];
    out "" end;
}

int main( ) {
    int n = 1 << 5;
    float* x, * y;
    float error = 0.0f;

    cudaMallocManaged( &x, n * sizeof( float ) );
    cudaMallocManaged( &y, n * sizeof( float ) );

    init( n, x, y );
    print( n, x, "x" );
    print( n, y, "y" );
    add_gpu<< <1, 1 >> > ( n, x, y );
    //add_cpu(n, x, y);
    cudaDeviceSynchronize( );
    print( n, y, "y" );

    error = checkError( n, 3.0f, y );
    out "error: " << error end;

    cudaFree( x );
    cudaFree( y );

    return 0;
}
  • 1
    user [proper CUDA error checking](https://stackoverflow.com/questions/14038589/what-is-the-canonical-way-to-check-for-errors-using-the-cuda-runtime-api), and run your code with `cuda-memcheck` – Robert Crovella Jun 18 '19 at 14:58

2 Answers2

1

I don't see exactly where the problem is but in order to debug it you should check the cuda errors.

Most cuda functions return a cuda status. You can maybe use a little wrapper function like this to check the errors

checkCudaError(const cudaError_t error) {
    if (error != cudaSuccess) {
        std::cout << "Cuda error: " << cudaGetErrorString(error) << std::endl;
        // maybe do something else
    }
}

and call function like cudaMallocManaged() this way

checkCudaError(cudaMallocManaged(&x, n * sizeof(float));

For all operations which are performed on the device (like custom kernels) you should run the kernel and after that call

cudaGetLastError()

and maybe also use checkCudaError()

checkCudaError(cudaGetLastError())

Note that cudaGetLastError() will always return a error if at some point an error occured and so you have to find the place where the first error occures. That is why you should check cuda error every time the GPU was used in some way.

Nopileos
  • 1,976
  • 7
  • 17
-1

https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__MEMORY.html#group__CUDART__MEMORY_1gc263dbe6574220cc776b45438fc351e8

Without copying the data to the device your GPU doesnt know the data and without copying them back your host doesnt know the results

Narase
  • 490
  • 2
  • 12
  • 1
    Nonsense. The code, as published, is completely correct and uses Unified Memory which is automatically accessible on both host and device without explicit memory transfers – talonmies Jun 18 '19 at 12:19