0

Similar to the linked question

I am confronted with the "atomicCAS" & "atomicExch" identifiers not found errors. I searched online for solutions but still cannot solve my problem.

I also changed Code Generation to compute_20, sm_20 in project configuration:

"Configuration Properties -> CUDA C/C++ -> Device -> Code Generation"

When I tried to "edit" this item, I found "Inherited values: compute_10,sm_10".

Moreover, when I tried to right-click the function "atomicCAS" or "atomicExch" and select from the menu "Go To Definition (F12)", I found that it points to "$(CudaToolkitIncludeDir)\sm_11_atomic_functions.h(191)".

1. How to get over the inherited compute_10, sm_10 values ?
2. Why does it use atomic functions from sm_11_atomic_functions.h instead of sm_20_atomic_functions?
3. Or anyone can helps to describe in detail how to solve this problem.


Thank you very much.


Update:

Let's take a look at the following code for example.

struct Lock {
    int *mutex;
    Lock( void ) {
        HANDLE_ERROR( cudaMalloc( (void**)&mutex, sizeof(int) ) );
        HANDLE_ERROR( cudaMemset( mutex, 0, sizeof(int) ) );
    }

    ~Lock( void ) {
        cudaFree( mutex );
    }

    __device__ void lock( void ) {
    #if __CUDA_ARCH__ >= 200
        while( atomicCAS( mutex, 0, 1 ) != 0 );
    #endif
    }

    __device__ void unlock( void ) {
    #if __CUDA_ARCH__ >= 200
        atomicExch( mutex, 0 );
    #endif
    }
};

I got a message: A definition for the symbol '__CUDA_ARCH__' could not be located.

Community
  • 1
  • 1
user2123066
  • 1
  • 1
  • 2

1 Answers1

1

First of all make sure, that CUDA toolkit is installed correctly, and all the paths (include, lib and bin) are set. Also check if you are including cuda_runtime.h in the cu file.

If you think the problem is due to the target architecture, then...

Surround the kernel code with the following #if block:

//Compile kernel code for Compute 2.0 and above only
#if __CUDA_ARCH__ >= 200 

//Kernel Code Here

#endif

In this way, the kernel code will only be compiled when you are compiling for Compute 2.0 and above. For lower compute capabilities, a dummy kernel will be generated by the compiler.

If you have an alternate implementation for lower compute capabilities, you can also do this:

#if __CUDA_ARCH__ >= 200 
     //Code using  atomicCAS, atomicExch
#else
    //Alternate implementation
#endif

Update:

Based on the code provided in the comment, you can do the following:

struct Lock 
{ 
    int *mutex;
    __device__ Lock( void )
    { 
        #if __CUDA_ARCH__ >= 200
        mutex = new int;
        (*mutex) = 0;
        #endif
    } 
    __device__ ~Lock( void ) 
    { 
        #if __CUDA_ARCH__ >= 200
        delete mutex;
        #endif
    }

    __device__ void lock( void ) 
    { 
        #if __CUDA_ARCH__ >= 200
        while( atomicCAS( mutex, 0, 1 ) != 0 ); 
        #endif
    }
    __device__ void unlock( void )
    { 
        #if __CUDA_ARCH__ >= 200
        atomicExch( mutex, 0 );
        #endif
    } 
};
sgarizvi
  • 16,623
  • 9
  • 64
  • 98
  • Thank you sgar91. Yes, I am sure the CUDA toolkit is installed correctly and all paths are set. I also include "cuda_runtime.h" as required. However, I found that __CUDA_ARCH__ is undefined. Where is it defined? Should I define it by myself in some head file ? – user2123066 Mar 19 '13 at 07:57
  • Are you sure you are doing it **inside** the kernel? – sgarizvi Mar 19 '13 at 07:58
  • @user2123066.. Please post the code as an update to the question, and check my updated answer. – sgarizvi Mar 19 '13 at 08:15
  • @sgar91..do you know where the macro `__CUDA_ARCH__` is defined ? – user2123066 Mar 19 '13 at 08:38
  • @user2123066.. [`__CUDA_ARCH__` is defined only in the device code](http://stackoverflow.com/a/8809924/1231073). – sgarizvi Mar 20 '13 at 08:25
  • @sgar91.. how to initialize a device class object from host ? thanks. – user2123066 Mar 27 '13 at 14:39