6

In a .cu file I've tried the following in the global scope (i.e. not in a function):

__device__ static const double cdInf = HUGE_VAL / 4;

And got nvcc error:

error : dynamic initialization is not supported for __device__, __constant__ and __shared__ variables.

How to define a C++ const/constexpr on the device, if that's possible?

NOTE1: #define is out of question not only for aesthetic reasons, but also because in practice the expression is more complex and involves an internal data type, not just double. So calling the constructor each time in each CUDA thread would be too expensive.

NOTE2: I doubt the performance of __constant__ because it's not a compile-time constant, but rather like a variable written with cudaMemcpyToSymbol.

Serge Rogatch
  • 13,865
  • 7
  • 86
  • 158
  • Possible duplicate of [Using constants with CUDA](http://stackoverflow.com/questions/16119923/using-constants-with-cuda) – m.s. Sep 12 '16 at 08:48
  • @m.s., that question limits the scope to `__constant__` and `#define`. Isn't there a way to use C++ constant on the device, as my code snippet suggests? – Serge Rogatch Sep 12 '16 at 09:05
  • @SergeRogatch: Do you actually need this to be an accessible variable on the device, at run time, with an address and everything? Because there are other alternatives to just `#DEFINE`s and `const`s. If you answer "no" I'll post a couple as an answer... – einpoklum Sep 12 '16 at 10:27
  • @einpoklum, I need it on the device (only), but I don't need its address. I would be glad to make it compile-time constant, so that on the device it turns into an immediate value. – Serge Rogatch Sep 12 '16 at 10:44

3 Answers3

9

Use a constexpr __device__ function:

#include <stdio.h>
__device__ constexpr double cdInf() { return HUGE_VAL / 4; }
__global__ void print_cdinf() { printf("in kernel, cdInf() is %lf\n", cdInf()); }
int main() { print_cdinf<<<1, 1>>>(); return 0; }

The PTX should be something like:

.visible .entry print_cdinf()(

)
{
        .reg .b64       %SP;
        .reg .b64       %SPL;
        .reg .b32       %r<2>;
        .reg .b64       %rd<7>;


        mov.u64         %rd6, __local_depot0;
        cvta.local.u64  %SP, %rd6;
        add.u64         %rd1, %SP, 0;
        cvta.to.local.u64       %rd2, %rd1;
        mov.u64         %rd3, 9218868437227405312;
        st.local.u64    [%rd2], %rd3;
        mov.u64         %rd4, $str;
        cvta.global.u64         %rd5, %rd4;
        // Callseq Start 0
        {
        .reg .b32 temp_param_reg;
        // <end>}
        .param .b64 param0;
        st.param.b64    [param0+0], %rd5;
        .param .b64 param1;
        st.param.b64    [param1+0], %rd1;
        .param .b32 retval0;
        call.uni (retval0), 
        vprintf, 
        (
        param0, 
        param1
        );
        ld.param.b32    %r1, [retval0+0];

        //{
        }// Callseq End 0
        ret;
}

With no code for the constexpr function. You could also use a constexpr __host__ function, but that's experimental in CUDA 7: use the nvcc command-line options seems to be --expt-relaxed-constexpr and see here for more details (thanks @harrism).

einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • Thanks, this looks like the answer I should accept and think how to handle this later, because currently CUDA 8RC doesn't support MSVC++2015, and MSVC++2013 doesn't support `constexpr`. – Serge Rogatch Sep 12 '16 at 18:02
  • 1
    @SergeRogatch: You could try manually overriding the CUDA compiler check to work with MSVC++2015 - it might support it but just be untested. Or support enough of MSVC++2015's functionality to work for you. – einpoklum Sep 12 '16 at 18:08
  • great advice, thanks again, I haven't thought about this. As I understood `nvcc` just doesn't support some C++14 or C++17 features supported in MSVC++2015. But I can live without those features for now too. Just I need C++11. Do you know how to override this compiler check? – Serge Rogatch Sep 12 '16 at 18:25
  • 1
    Edit your `host_config.h` in the CUDA include directory. – einpoklum Sep 12 '16 at 18:41
  • 6
    You may also want to check out the experimental nvcc option `--relaxed-constexpr` which lets you leave off the__device__ from your constexpr. http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#constexpr-functions – harrism Sep 14 '16 at 22:27
2

To make the code you have shown compile and work as expected, you need to initialize the variable at runtime, not compile time. To do this, add a host side call to cudaMemcpyToSymbol, something like:

__device__ double cdInf;

// ...

double val = HUGE_VAL / 4
cudaMemcpyToSymbol(cdInf, &val, sizeof(double));

However, for a single value, passing it as a kernel argument would seem far more sensible. The compiler will automagically store the argument in constant memory on all supported architectures, and there is a "free" constant cache broadcast mechanism which should make the cost of accessing the value at runtime negligible.

talonmies
  • 70,661
  • 34
  • 192
  • 269
  • Actually, I'm already using this. Apart from the performance doubt, there is maintainability issue that the constant has to be initialized in a different place than where it is defined. So in C++ I get 3 places 1) declaration in the class 2) definition in .cpp file 3) `cudaMemcpyToSymbol` initialization . The first 2 are inevitable (except for `int`s), but I would try to get rid of the 3rd one. – Serge Rogatch Sep 12 '16 at 10:53
  • 1
    If you want a pure compile time constant and you are using g++, you can just use `const` without any cuda specifiers and access the constant in device code as an immediate constant value – talonmies Sep 12 '16 at 12:15
  • I'm using MSVC++2013: it doesn't support `constexpr`, but CUDA 8RC still doesn't support MSVC++2015. I'll try "just const" option, but I guess the compiler will give an error like "cannot access host variable form the device" for host `const`s except integers. – Serge Rogatch Sep 12 '16 at 12:19
  • 1
    I don't think it will work with visual c++. It does work with g++ – talonmies Sep 12 '16 at 12:20
  • Yes, there is another reason: the constructor of the type of the constant value is a device function. There is a pair host type, but they are different types. – Serge Rogatch Sep 12 '16 at 12:24
0

To initialize it you have to use cudaMemcpyToSymbol. It is not a compile time constant but stored in the constant memory of the device and has some advantages over global memory. From the CUDA blogspot:

For all threads of a half warp, reading from the constant cache is as fast as reading from a register as long as all threads read the same address. Accesses to different addresses by threads within a half warp are serialized, so cost scales linearly with the number of different addresses read by all threads within a half warp.

You do not need to use const, and you cannot use it. It is not a c++ constant since you need to modify it through cudaMemcpyToSymbol. So it is not a "real" constant at least from the c++ point of view. But it behaves like a constant inside the device kernels because you can modify it only through cudaMemcpyToSymbol which is callable only from host.

curious_amateur
  • 220
  • 2
  • 8
  • In terms of C++ such a variable would not be `const`: if I try to define it with `const` and an initializer, I get the error mentioned in the question. If I define it with `const` and without an initializer, I also get an error that initializer is required. So that's a loop in compiler requirements. – Serge Rogatch Sep 12 '16 at 09:24
  • All the references to constant memory here are irrelevant. – talonmies Sep 12 '16 at 10:10
  • NOTE2 is about performance, I tried to clarify performance doubts. – curious_amateur Sep 12 '16 at 12:33