2

Hello fellow StackOverflow Users,

I have this problem : I have one very big image which i want to work on. My first idea is to divide the big image to couple of sub-images and then send this sub-images to different GPUs. I don't use the Image-Object, because I don't work with the RGB-Value, but I'm only using the brightness value to manipulate the image.

My Question are:

  1. Can I use one context with many commandqueues for every device? or should I use one context with one commandqueue for each device ?
  2. Can anyone give me an example or ideas, how I can dynamically change the inputMem-Data (sub-images data) for setting up the kernel arguments to send to the each device ? (I only know how to send the same input data)
  3. For Example, If I have more sub-images than the GPU-number, how can I distribute the sub-images to the GPUs ?
  4. Or maybe another smarter approach?

I'll appreciate every help and ideas. Thank you very much.

aelias
  • 37
  • 1
  • 6

2 Answers2

4
  1. Use 1 context, and many queues. The simple method is one queue per device.
  2. Create 1 program, and a kernel for each device (created from the same program). Then create different buffers (one per device) and set each kernel with each buffer. Now you have different kernels, and you can queue them in parallel with different arguments.
  3. To distribute the jobs, simple use the event system. Checking if a GPU is empty and queing there the next job.

I can provide more detailed example with code, but as general sketch that should be the way to follow.

DarkZeros
  • 8,235
  • 1
  • 26
  • 36
  • Hallo DarkZeros. Thx for your help, I will test it right away. – aelias Sep 19 '13 at 09:08
  • Hi @DarkZeros, for number 3. How will the event system look like and where should I put the event that check if my images already being proceed or not? I cannot put the event on enqueueNDRangeKernel, because I already have event set there for combining the results (with clWaitForEvent function). Can you maybe give me an example? or maybe I'm misunderstanding your point here. Thank you – aelias Sep 19 '13 at 12:25
  • Sry for the late reply. You can set a event each time you queue a kernel to a device. Then, the next time you have to queue another kernel, just check the event status (Search for events that have completed). That will give you a device witch is already free of work, and queue the kernel to that one. – DarkZeros Nov 10 '13 at 19:57
  • I forgot to say, that you may have lots of events. You are not forced to have only 1. I typically put an event for each call I made, then I use only the amount I really need. (of course I use C++ wrappers otherwise it would be a hell to take care of all of them) – DarkZeros Nov 10 '13 at 23:46
1

AMD APP SDK has few samples on multi gpu handling. You should be looking at these 2 samples

  1. SimpleMultiDevice: shows how to create multiple commandqueues on single context and some performance results
  2. BinomailoptionMultiGPU: look at loadBalancing method. It divides the buffer based on compute units & max clock freq of available gpus
Krishnaraj
  • 421
  • 1
  • 3
  • 10