In order to use unified memory feature in CUDA 6, the following requirement must be meet,
- a GPU with SM architecture 3.0 or higher (Kepler class or newer)
- a 64-bit host application and operating system, except on Android
- Linux or Windows
My setup is,
- System: ubuntu 13.10 (64-bit)
- GPU: GTX770
- CUDA: 6.0
- Driver Version: 331.49
The sample code are taken from the programming guide page 210.
__device__ __managed__ int ret[1000];
__global__ void AplusB(int a, int b) {
ret[threadIdx.x] = a + b + threadIdx.x;
}
int main() {
AplusB<<< 1, 1000 >>>(10, 100);
cudaDeviceSynchronize();
for(int i=0; i<1000; i++)
printf("%d: A+B = %d\n", i, ret[i]);
return 0;
}
The nvcc compile option I used is,
nvcc -m64 -Xptxas=-Werror -arch=compute_30 -code=sm_30 -o UM UnifiedMem.cu
This code compiles perfectly fine. During execution, it produces "segmentation fault" at printf(). It feels like that unified memory feature didn't come into effect. The address of variable ret is still of GPU but printf is called on CPU. CPU is trying to access a piece of data that is not allocated on CPU so it produces a segmentation fault. Can anybody help me? What is wrong here?