1

I have been reading through many of the SO questions related to constant memory and I still don't understand why my program is not working. Overall it looks like follows

Common.cuh

__constant__ int numElements;

__global__
void kernelFunction();

Common.cu

#include "Common.cuh"
#include <stdio.h>

__global__
kernelFunction()
{
   printf("NumElements = %d", numElements);
}

Test.cu

#include "Common.cuh"

int main()
{
   int N = 100;
   cudaMemcpyToSymbol(numElements,&N,sizeof(int));
   kernelFunction<<<1,1>>>();
   cudaDeviceSynchronize();
   return 0;
}

It compiles with no error but when printing the value of numElements I just get a random value. Can someone point me in the right direction to get to understand this?

BRabbit27
  • 6,333
  • 17
  • 90
  • 161

1 Answers1

6

This line:

__constant__ int numElements;

has compilation unit scope. That means if you compile it into one module, and also into another module, the two modules will have different instantiations of numElements in __constant__ memory.

The solution is to use separate compilation and linking, to device-link the two modules together, at which point the symbol will be resolved between the two modules by the device linker.

nvcc -arch=sm_20 -rdc=true -o test common.cu test.cu

example:

$ cat common.cuh
#ifndef COMMON_CU
extern __constant__ int numElements;
#endif
__global__
void kernelFunction();
$ cat common.cu
#define COMMON_CU
#include "common.cuh"
#include <stdio.h>

__constant__ int numElements;
__global__
void kernelFunction()
{
   printf("NumElements = %d\n", numElements);
}
$ cat test.cu
#define TEST_CU
#include "common.cuh"

int main()
{
   int N = 100;
   cudaMemcpyToSymbol(numElements,&N,sizeof(int));
   kernelFunction<<<1,1>>>();
   cudaDeviceSynchronize();
   return 0;
}

$ nvcc -arch=sm_20 -rdc=true -o test common.cu test.cu
$ ./test
NumElements = 100
$
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • This works fine when only **Common.cu** uses the constant variable `numElements`. However, if I want to use the constant variable also in **Test.cu** I get `nvlin error: Multiple definition of 'numElements'` – BRabbit27 Jul 08 '14 at 06:29
  • 1
    Perhaps you should read the answer and posted code closely. test.cu *does* use the same constant variable, as it is the doing the writing to the constant variable, and the one calling the `kernelFunction`. The multiple definition error is because `numElements` can only be defined in one place. Everywhere else must refer to it by `extern`. Please read the answer again carefully. – Robert Crovella Jul 08 '14 at 06:48
  • Note that my answer differs from your code by *more* than just the compile command. In addition, I have moved the definition of `numElements` *out* of the header file, and into one of the source files (`common.cu`). In my header file (`common.cuh`) it only refers to `numElements` by `extern`, and then *only* if that header file is included in a file other than `common.cu`. So there are code changes between my answer and yours, not just compile command differences. – Robert Crovella Jul 08 '14 at 07:26
  • 1
    As indicated in the marked duplicate question/answer, this issue can also be worked around by moving the definition of, and all explicit references to `numElements` into a single file. All manipulation of `numElements` in other files (i.e. compilation units, or modules) must occur indirectly via wrapper functions which are defined in the file where `numElements` is defined. That method then does not require device code linking between modules. – Robert Crovella Jul 08 '14 at 07:32
  • I wanted to avoid having a large cu file but I after reading your exaple code carefully I think I'll have to do it that way. Thanks. – BRabbit27 Jul 08 '14 at 07:58
  • Why did you moved the definition of `numElements` out of the header file? – BRabbit27 Jul 08 '14 at 14:54
  • Because there was no particular reason to have it there. We're not going to define the item in multiple places (otherwise we run into the "multiple definition" error.) I can see you're struggling mightily with this. Perhaps you should try the same thing with ordinary C++ code. If you define a static variable in a header file e.g. `int numElements;`, and include it in multiple modules that are linked together, it won't work. Example [here](http://pastebin.com/gQ8A0EbV). – Robert Crovella Jul 08 '14 at 15:05
  • I understand the difference between declaration and definition. I made a very basic [example](http://pastebin.com/g62UnCZQ) that works as I expected. Now when I try to implement the same idea into my app I get errors like `undefined reference to __cudaRegisterLinkedBinary_63_tmpxft_00001170_00000000_6_MCSimulationOptionWorstOfGPU_cpp1_ii_541945d4'`. Could it be possible to continue the discussion on a chat? – BRabbit27 Jul 08 '14 at 15:18
  • I believe I should say I am using CMake in my project and perhaps I'm missing some option in the compiler that I don't know. – BRabbit27 Jul 08 '14 at 15:22
  • 1
    Sorry, mostly I don't do chat. Furthermore I know almost nothing about CMake. My purpose is not just to help *you* but to help others who may find answers on SO via google or some other means. (Chat doesn't help that at all.) Since your basic example works, and the error you are now receiving looks completely different, I suggest posting a new SO question. – Robert Crovella Jul 08 '14 at 15:23