I have some cuda code based on c++ that works well for a single gpu. However I have 2 gpu's on my system and I want to use them both.
After looking through nvidia's documentation, on page 42 I found this code to allocate memory on multiple gpus:
int gpu_A = 0;
int gpu_B = 1;
cudaSetDevice( gpu_A );
cudaMalloc( &d_A, num_bytes);
That worked successfully. So based on that logic I tried this for memcopy:
int gpu_A = 0;
int gpu_B = 1;
cudaSetDevice( gpu_A );
cudaMemcpy(gpuPointer, cpuArray, sizeof(int)*number, cudaMemcpyHostToDevice)
That seemed to work. However I later found the real code for memcopy provided by the docs (page 18):
for(int i=0; i<num_gpus-1; i++ )
cudaMemcpy(d_a[i+1], gpu[i+1], d_a[i], gpu[i], num_bytes);
When I try to compile it I get
error : argument of type "int" is incompatible with parameter of type "void *"
I searched the answer base here and I found a similar question asked here.
This is the relevant code provided:
double *dev_a[2], *dev_b[2], *dev_c[2];
const int Ns[2] = {N/2, N-(N/2)};
// copy the arrays 'a' and 'b' to the GPUs
for(int dev=0,pos=0; dev<2; pos+=Ns[dev], dev++) {
cudaSetDevice(dev);
cudaMemcpy( dev_a[dev], a+pos, Ns[dev] * sizeof(double), cudaMemcpyHostToDevice);
cudaMemcpy( dev_b[dev], b+pos, Ns[dev] * sizeof(double), cudaMemcpyHostToDevice);
}
However when I try to compile it I get the same error.
How are you supposed to successfully perform a memcopy operation onto multiple gpus? Is the way I did it the first time the correct way?
I mean it seems to work, but is it best practice?
Edit --------
Simple complete example provided:
int *cpuArray= (int*)malloc(sizeof(int)*720);
int *gpuArray;
// Allocate memory on the GPUs
int dev;
for(dev=0; dev<2; dev++) {
cudaSetDevice(dev);
cudaMalloc((void**) &gpuArray, sizeof(int)*720);
}
// copy The array to the GPUs
for(dev=0; dev<2; dev++) {
cudaSetDevice(dev);
CUDA_CALL(cudaMemcpy(gpuArray, cpuArray, sizeof(int)*720, cudaMemcpyHostToDevice));
}