3

I need to write the code into several .cu files. But where should I define the device variables which are use for many .cu files.

An example

File common.h

__device__ int x;

File A.cu

__global__ void a() 

File B.cu

__global__ void b() 

a(),b() both use x. what should I do?

In C language, I should write something like extern device int x; Then I define device int x in another place. But in CUDA I can not do it. If I do, it tells me ‘..........’ previously declared here

worldterminator
  • 2,968
  • 6
  • 33
  • 52

1 Answers1

5

EDIT : @talonmies was right (as usual). So I've deleted my comment about CUDA 4.1

Furthermore the compiling commands I gave were not quite right. So let me replace my answer with one that demonstrably works and has the proper instructions.

You need CUDA 5.0 and a compute capability 2.0 or greater device for this to work.

I'm sure there's probably a better way, but this seems to work for me:

com.h:

#ifndef DEVMAIN
extern __device__ int x;
#endif

a.cu:

#include "com.h"
__global__ void a(){

  x = -5;
}

b.cu:

#include <stdio.h>
#define DEVMAIN
#include "com.h"

extern __global__ void a();
__device__ int x;

__global__ void b(){

  x = 5;
}

int main() {
  int temp=7;
  cudaMemcpyToSymbol(x,&temp, sizeof(int));
  a<<<1,1>>>();
  cudaMemcpyFromSymbol(&temp,x,sizeof(int));
  printf("in host : %d\n",temp);
  b<<<1,1>>>();
  cudaMemcpyFromSymbol(&temp,x,sizeof(int));
  printf("in host2 : %d\n",temp);
  return 0;
}

compiling:

nvcc -arch=sm_20 -dc a.cu
nvcc -arch=sm_20 -dc b.cu
nvcc -arch=sm_20 -o ab a.o b.o

output:

$ ./ab
in host : -5
in host2 : 5
$

Sorry for my previous errors.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Worth pointing out this is probably only legal in CUDA 5 which has a proper device code linker. – talonmies Nov 28 '12 at 07:40
  • @RobertCrovella I tried your code, could you please try this code,it should be 1,-1 but it is 1,1.? `int main() { int temp; b<<<1,1>>>(); cudaMemcpyFromSymbol(&temp,x,sizeof(int),0,D_T_H); printf("in host : %d\n",temp); a<<<1,1>>>(); cudaMemcpyFromSymbol(&temp,x,sizeof(int),0,D_T_H); printf("in host2 : %d\n",temp); cudaDeviceSynchronize(); return 0; }` – worldterminator Nov 29 '12 at 02:27
  • @RobertCrovella and D_T_H is `#define D_T_H cudaMemcpyDeviceToHost` – worldterminator Nov 29 '12 at 02:28
  • @worldterminator I have made several mistakes in my previous answer so I have fixed those (I hope) and edited my answer with the changes. – Robert Crovella Nov 29 '12 at 04:13
  • Yeah, it works! BTW, could you tell me what does -dc mean? What's the differences from -c? – worldterminator Nov 29 '12 at 06:26
  • 2
    You'll find the answer [here](http://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#code-changes-for-separate-compilation). With CUDA 5, we can now link device code separately. In order to do this, the compiled device code has to be delivered to the linker in a relocatable format. -dc instructs the compiler to generate relocatable device code which can be linked later. – Robert Crovella Nov 29 '12 at 06:53