3

I'm having a issue with my kernel.cu class

Calling nvcc -v kernel.cu -o kernel.o I'm getting this error:

kernel.cu(17): error: identifier "atomicAdd" is undefined

My code:

#include "dot.h"
#include <cuda.h>
#include "device_functions.h" //might call atomicAdd

__global__ void dot (int *a, int *b, int *c){
    __shared__ int temp[THREADS_PER_BLOCK];
    int index = threadIdx.x + blockIdx.x * blockDim.x;
    temp[threadIdx.x] = a[index] * b[index];

    __syncthreads();

    if( 0 == threadIdx.x ){
        int sum = 0;
        for( int i = 0; i<THREADS_PER_BLOCK; i++)
            sum += temp[i];
        atomicAdd(c, sum);
    }
}

Some suggest?

talonmies
  • 70,661
  • 34
  • 192
  • 269
Custodio
  • 8,594
  • 15
  • 80
  • 115

2 Answers2

14

You need to specify an architecture to nvcc which supports atomic memory operations (the default architecture is 1.0 which does not support atomics). Try:

nvcc -arch=sm_11 -v kernel.cu -o kernel.o

and see what happens.


EDIT in 2015 to note that the default architecture in CUDA 7.0 is now 2.0, which supports atomic memory operations, so this should not be a problem in newer toolkit versions.

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • In addition, [some atomic operations](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions) are only supported by devices of compute capability higher than the current default (e.g. only supported by cc 3.5, whereas the current default is 2.0). In these cases, it is still necessary to specify an appropriate arch option, eg. `-arch=sm_35`, and furthermore you cannot specify multiple arch options where some of them don't meet the minimum necessary for the atomic in use. – Robert Crovella Mar 19 '16 at 14:42
1

Today with the latest cuda SDK and toolkit this solution will not work. People also say that adding:

compute_11,sm_11; OR compute_12,sm_12; OR compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;

to CUDA in the Project Properties in Visual Studio 2010 will work. It doesn't.

You have to specify this for the .cu file itself in its own properties (Under the C++/CUDA->Device->Code Generation) tab such as:

compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;
RAS
  • 8,100
  • 16
  • 64
  • 86
  • 3
    When you say this solution won't work with the latest toollkit, what exactly do you mean? The solution shows a command line `nvcc` invocation with the `-arch` option. That is still perfectly valid with the CUDA 4.2 toolkits for Windows 7, Linux and OS X. – talonmies May 13 '12 at 16:24