0

I have some cuda code based on c++ that works well for a single gpu. However I have 2 gpu's on my system and I want to use them both.

After looking through nvidia's documentation, on page 42 I found this code to allocate memory on multiple gpus:

int gpu_A = 0;
int gpu_B = 1;
cudaSetDevice( gpu_A );
cudaMalloc( &d_A, num_bytes);

That worked successfully. So based on that logic I tried this for memcopy:

int gpu_A = 0;
int gpu_B = 1;
cudaSetDevice( gpu_A );
cudaMemcpy(gpuPointer, cpuArray, sizeof(int)*number, cudaMemcpyHostToDevice)

That seemed to work. However I later found the real code for memcopy provided by the docs (page 18):

for(int i=0; i<num_gpus-1; i++ )
cudaMemcpy(d_a[i+1], gpu[i+1], d_a[i], gpu[i], num_bytes);

When I try to compile it I get

error : argument of type "int" is incompatible with parameter of type "void *"

I searched the answer base here and I found a similar question asked here.

This is the relevant code provided:

double *dev_a[2], *dev_b[2], *dev_c[2];
const int Ns[2] = {N/2, N-(N/2)};

// copy the arrays 'a' and 'b' to the GPUs
for(int dev=0,pos=0; dev<2; pos+=Ns[dev], dev++) {
    cudaSetDevice(dev);
    cudaMemcpy( dev_a[dev], a+pos, Ns[dev] * sizeof(double), cudaMemcpyHostToDevice);
    cudaMemcpy( dev_b[dev], b+pos, Ns[dev] * sizeof(double), cudaMemcpyHostToDevice);
}

However when I try to compile it I get the same error.

How are you supposed to successfully perform a memcopy operation onto multiple gpus? Is the way I did it the first time the correct way?

I mean it seems to work, but is it best practice?

Edit --------

Simple complete example provided:

int *cpuArray= (int*)malloc(sizeof(int)*720);
int *gpuArray;

// Allocate memory on the GPUs
int dev;
for(dev=0; dev<2; dev++) {
   cudaSetDevice(dev);
   cudaMalloc((void**) &gpuArray, sizeof(int)*720);
}

// copy The array to the GPUs
for(dev=0; dev<2; dev++) {
  cudaSetDevice(dev);
  CUDA_CALL(cudaMemcpy(gpuArray, cpuArray, sizeof(int)*720, cudaMemcpyHostToDevice));
}
YAHsaves
  • 1,697
  • 12
  • 33
  • please include the definitions of all relevant variables (e.g.: `dev_a`, `a`, etc.), or better a [mcve] that we can try to compile ourselves – UnholySheep Jan 11 '18 at 14:32
  • @UnholySheep I edited my answer to include what you requested. – YAHsaves Jan 11 '18 at 14:36
  • @talonmies, I apologize I didn't realize I left something important out. Sorry I am still new to c/c++ so much that I don't even know the difference between the languages. Reviewing your answer again I can't see what I left out. Was it the malloc commands? Or can you help me get your code working? – YAHsaves Jan 11 '18 at 16:18
  • The code in that answer https://stackoverflow.com/questions/10529972/multi-gpu-basic-usage compiles without error -- https://pastebin.com/vaBDndtX. I don't know what the *actual* code you are compiling is, but it isn't what is in that answer, nor what is in your question either. – talonmies Jan 11 '18 at 16:40
  • 1
    do you know what square brackets are: `[ ]` and what they signify in the C language? If you don't, you should study C programming first. If you do understand them, note that they are present on **every** variable definition in your code snippet after "relevant code provided" but not on any variable definition in your code snippet after "complete example provided". You can't run `cudaMalloc` on the same `gpuArray` variable in your loop and expect anything sensible to come from that. You need a separate `gpuArray` variable for each GPU, and the square bracket syntax is one way to get there. – Robert Crovella Jan 12 '18 at 03:33
  • So is this question abandoned? Did you solve the problem yourself and if you did, are you planning to answer your own question. Otherwise this question should be closed or deleted. – talonmies Dec 02 '19 at 15:37

0 Answers0