-2

I'm learning how to use multi GPU for my CUDA application. I tried out a simple program which successfully ran on a system having two Tesla C2070. But when I tried to run the same program on a different system having a Tesla K40c and a Tesla C2070, it shows a segmentation fault. What might be the problem? I'm sure that there is no problem with the code. Is there any settings to be done in the environment? I have attached my code here for your reference.

#include <stdio.h>
#include "device_launch_parameters.h"
#include "cuda_runtime_api.h"

__global__ void testA(int *a)
{
   int i = blockIdx.x * blockDim.x + threadIdx.x;
   a[i] = a[i] * 2;
}

int main()
{
   int *ai, *bi, *ao, *bo;
   int iter;
   cudaStream_t streamA, streamB;
   cudaSetDevice(0);
   cudaStreamCreate(&streamA);
   cudaMalloc((void**)&ao, 10 * sizeof(int));
   cudaHostAlloc((void**)&ai, 10 * sizeof(int), cudaHostAllocMapped);
   for(iter=0; iter<10; iter++)
   {
       ai[iter] = iter+1;
   }

   cudaSetDevice(1);
   cudaStreamCreate(&streamB);
   cudaMalloc((void**)&bo, 10 * sizeof(int));
   cudaHostAlloc((void**)&bi, 10 * sizeof(int), cudaHostAllocMapped);
   for(iter=0; iter<10; iter++)
   {
       bi[iter] = iter+11;
   }

   cudaSetDevice(0);
   cudaMemcpyAsync(ao, ai, 10 * sizeof(int), cudaMemcpyHostToDevice, streamA);
   testA<<<1, 10, 0, streamA>>>(ao);
   cudaMemcpyAsync(ai, ao, 10 * sizeof(int), cudaMemcpyDeviceToHost, streamA);

   cudaSetDevice(1);
   cudaMemcpyAsync(bo, bi, 10 * sizeof(int), cudaMemcpyHostToDevice, streamB);
   testA<<<1, 10, 0, streamB>>>(bo);
   cudaMemcpyAsync(bi, bo, 10 * sizeof(int), cudaMemcpyDeviceToHost, streamB);

   cudaSetDevice(0);
   cudaStreamSynchronize(streamA);

   cudaSetDevice(1);
   cudaStreamSynchronize(streamB);

   printf("%d %d %d %d %d\n",ai[0],ai[1],ai[2],ai[3],ai[4]);
   printf("%d %d %d %d %d\n",bi[0],bi[1],bi[2],bi[3],bi[4]);
   return 0;
}

The segmentation fault occurs when bi array is initialized inside the for loop, which means the memory is not allocated for bi.

Vijay
  • 7
  • 1
  • Compiled and ran fine on k20 and quadro k5000 with cuda 6.5. Got the following result: 2 4 6 8 10 22 24 26 28 30 – Christian Sarofeen Dec 23 '14 at 19:25
  • Thanks Park, Christian. I saw the error code and it is because of 'Uncorrectable ECC error encountered'. This error occurs when I try to allocate memory in the second GPU (cudaMalloc((void**)&bo, 10 * sizeof(int)); ) . I disabled the ECC support and it ran fine in the system having one k40c and one tesla c2070. Is it because the architectures are different? – Vijay Dec 24 '14 at 06:35

1 Answers1

1

With the new information you've provided based on the error checking, the problem you were having was due to the ECC error.

When a GPU has a double-bit ECC error detected in the current session, it is no longer usable for compute activities until either:

  1. the GPU is reset (e.g. via system reboot, or via driver unload/reload, or manually via nvidia-smi, etc.),

(or)

  1. ECC is disabled (which usually also may require a system reboot or gpu reset)

You can review ECC status of your GPUs with the nvidia-smi command. You probably already know which GPU was reporting the ECC error, since you disabled ECC, but in case not, based on your initial report it would be the one that was associated with the cudaSetDevice(1); command, which probably should have been the Tesla C2070 (i.e. not the K40).

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Thanks Robert. That's correct, the second device is Tesla C2070. I just saw this document from the link [Click here](http://www.nvidia.in/content/quadro/maximus/maximus-multi-gpu-faq.pdf) where it is specified mixing GPUs from different architectures is not recommended. Thanks. – Vijay Dec 24 '14 at 17:03
  • Your link pertains to [Maximus](http://www.nvidia.com/object/multi-gpu-technology.html), which is really separate from general usage of Tesla GPUs. There are certain technical issues (e.g. relating to P2P transfers) when mixing GPUs from different generations, so it's probably not a great idea on a large scale, but it's not the source of the problem you're reporting here. – Robert Crovella Dec 24 '14 at 17:16