Some issue with Atomic add in CUDA kernel operation

Question

I'm having a issue with my kernel.cu class

Calling nvcc -v kernel.cu -o kernel.o I'm getting this error:

kernel.cu(17): error: identifier "atomicAdd" is undefined

My code:

#include "dot.h"
#include <cuda.h>
#include "device_functions.h" //might call atomicAdd

__global__ void dot (int *a, int *b, int *c){
    __shared__ int temp[THREADS_PER_BLOCK];
    int index = threadIdx.x + blockIdx.x * blockDim.x;
    temp[threadIdx.x] = a[index] * b[index];

    __syncthreads();

    if( 0 == threadIdx.x ){
        int sum = 0;
        for( int i = 0; i<THREADS_PER_BLOCK; i++)
            sum += temp[i];
        atomicAdd(c, sum);
    }
}

Some suggest?

talonmies · Accepted Answer · 2015-10-11T20:18:17.970

14

You need to specify an architecture to nvcc which supports atomic memory operations (the default architecture is 1.0 which does not support atomics). Try:

nvcc -arch=sm_11 -v kernel.cu -o kernel.o

and see what happens.

EDIT in 2015 to note that the default architecture in CUDA 7.0 is now 2.0, which supports atomic memory operations, so this should not be a problem in newer toolkit versions.

edited Oct 11 '15 at 20:18

answered May 13 '11 at 16:26

talonmies

70,661
34
192
269

In addition, [some atomic operations](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions) are only supported by devices of compute capability higher than the current default (e.g. only supported by cc 3.5, whereas the current default is 2.0). In these cases, it is still necessary to specify an appropriate arch option, eg. `-arch=sm_35`, and furthermore you cannot specify multiple arch options where some of them don't meet the minimum necessary for the atomic in use. – Robert Crovella Mar 19 '16 at 14:42

score 1 · Answer 2 · edited Oct 20 '12 at 06:59

1

Today with the latest cuda SDK and toolkit this solution will not work. People also say that adding:

compute_11,sm_11; OR compute_12,sm_12; OR compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;

to CUDA in the Project Properties in Visual Studio 2010 will work. It doesn't.

You have to specify this for the .cu file itself in its own properties (Under the C++/CUDA->Device->Code Generation) tab such as:

compute_13,sm_13;
compute_20,sm_20;
compute_30,sm_30;

edited Oct 20 '12 at 06:59

RAS

8,100
16
64
86

answered May 13 '12 at 16:04

user1392334

19
1

3

When you say this solution won't work with the latest toollkit, what exactly do you mean? The solution shows a command line `nvcc` invocation with the `-arch` option. That is still perfectly valid with the CUDA 4.2 toolkits for Windows 7, Linux and OS X. – talonmies May 13 '12 at 16:24

Some issue with Atomic add in CUDA kernel operation

2 Answers2

Linked

Related