5

I'm writing some platform specific optimizations and while I'm aware of the fact that I could parse the vendor string in the host code and send that to the kernel using the -D option, it is perhaps more convenient to detect the vendor in the kernel directly, without host involvement (that way it is possible to optimize kernels even without access to host source code, ...).

So far, I have come up with the following:

#ifdef __NV_CL_C_VERSION
/**
 *  @def NVIDIA
 *  @brief defined when compiling on NVIDIA GPUs
 */
#define NVIDIA
#endif // __NV_CL_C_VERSION

#if defined(__WinterPark__) || defined(__BeaverCreek__) || defined(__Turks__) || \
    defined(__Caicos__) || defined(__Tahiti__) || defined(__Pitcairn__) || \
    defined(__Capeverde__) || defined(__Cayman__) || defined(__Barts__) || \
    defined(__Cypress__) || defined(__Juniper__) || defined(__Redwood__) || \
    defined(__Cedar__) || defined(__ATI_RV770__) || defined(__ATI_RV730__) || \
    defined(__ATI_RV710__) || defined(__Loveland__) || defined(__GPU__) || \
    defined(__Hawaii__)
#define AMD
/**
 *  @def AMD
 *  @brief defined when compiling on AMD GPUs
 *  @note This list was originally found at https://github.com/magnumripper/JohnTheRipper/wiki/Predefined-macros-in-OpenCL-(standard-and-proprietary) and copied shamelessly. It is most definitely incomplete and contains the troubling  __GPU__.
 *  @note AMD also defines __CPU__ when compiling for CL_DEVICE_TYPE_CPU.
 */
#endif // ...

Any additions or corrections? Anyone knows what Intel defines?

the swine
  • 10,713
  • 7
  • 58
  • 100

1 Answers1

1

I have just tried on AMD Fury X with the 1912.5 driver. The following three tests all print the message:

#ifdef cl_amd_device_attribute_query
#pragma message "here goes AMD"
#endif

#ifdef __GPU__
#pragma message "here goes AMD GPU"
#endif

#ifdef __Fiji__
#pragma message "here goes Fiji AMD"
#endif

However, note that cl_amd_device_attribute_query is not a good test for an AMD device as the AMD platform also includes the Intel CPU as a device and gives the same extension for it. Bummer.

I was going through the amdocl64.dll and noticed the following:

-cl-std=CL2.0
#define __clang__ 1
#define __clang_major__ 3
#define __clang_minor__ 6
#define __ENDIAN_LITTLE__ 1
#define __SPIR32 1
#define __SPIR32__ 1
#define __STDC__ 1
#define __STDC_HOSTED__ 1
#define __STDC_VERSION__ 199901L
#define __STDC_UTF_16__ 1
#define __STDC_UTF_32__ 1
#define __OPENCL_C_VERSION__ 200
#define __OPENCL_VERSION__ 200
-Wf,--force_disable_spir
-fno-lib-no-inline
-fno-sc-keep-calls
-fno-enable-dump
-cl-internal-kernel
-cl-std=CL
-cl-std=CL1.2
-just-kernel=
-DFP_FAST_FMAF=1
-DFP_FAST_FMA=1
-cl-denorms-are-zero
cl-kernel-arg-info
-fno-bin-llvmir
-fno-image-support
-mfast-fmaf
-mfast-fma kernel-arg-alignment

Note that neither __GPU__ or __Fiji__ are found in this dll. Otherwise seems like a bunch of interesting options. Note that not all of them work, some of them likely need to be prefixed with a -.

the swine
  • 10,713
  • 7
  • 58
  • 100