5

In the host code, it seems that the __CUDA_ARCH__ macro wont generate different code path, instead, it will generate code for exact the code path for the current device.

However, if __CUDA_ARCH__ were within device code, it will generate different code path for different devices specified in compiliation options (/arch).

Can anyone confirm this is correct?

talonmies
  • 70,661
  • 34
  • 192
  • 269
user0002128
  • 2,785
  • 2
  • 23
  • 40

2 Answers2

13

__CUDA_ARCH__ when used in device code will carry a number defined to it that reflects the code architecture currently being compiled.

It is not intended to be used in host code. From the nvcc manual:

This macro can be used in the implementation of GPU functions for determining the virtual architecture for which it is currently being compiled. The host code (the non-GPU code) must not depend on it.

Usage of __CUDA_ARCH__ in host code is therefore undefined (at least by CUDA). As pointed out by @tera in the comments, since the macro is undefined in host code, it could be used to differentiate host/device paths for example, in a __host__ __device__ function definition.

#ifndef __CUDA_ARCH__
//host code here
#else
//device code here
#endif
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • 5
    Actually, `__CUDA_ARCH__` is not even defined in host code (which can be used to distinguish between host and device compilation). Which is why it seems the tests in host code always appear to evaluate the same way. – tera Apr 18 '13 at 11:20
1

The variable "__CUDA_ARCH__" is used in C++ code for CUDA to specify the architecture of the NVIDIA GPU that the code will be compiled and executed on. This is important because different NVIDIA GPUs have different architectures and capabilities, and the code needs to be optimized for the specific GPU architecture to achieve maximum performance. The values for the variable "__CUDA_ARCH__" depend on the specific GPU architecture being targeted. Here are some examples:

  • For NVIDIA Tesla K80: 300, 370
  • For NVIDIA Tesla P100: 600, 610
  • For NVIDIA GeForce GTX 1080 Ti: 610, 700
  • For NVIDIA Quadro P4000: 610, 620

It is important to note that these values may change with different CUDA versions and updates. It is recommended to consult the CUDA documentation for the specific architecture being targeted.

Tony.sy
  • 11
  • 2