2

Basically I have two GPUs and I want to execute some kernels on each of them. I don't want the GPUs to be working on the same kernel with each doing some part of it(I don know if this is possible), just in case I don even want to see that behavior.

I just want to make sure that both the devices are being exercised. I have created context and the command queues for both of them. But I see only one kernel gets executed which means only one device is being used. This is how I have done it. . .

cl_device_id *device;
cl_kernel *kernels;
...
// creating context.  
context = clCreateContext(0, num_devices, device, NULL, NULL, &error);
...
// creating command queues for all kernels
for(int i = 0; i<num_kenrels; i++)
    cmdQ[i] = clCreateCommandQueue(context, *device, 0, &error);
...
// enqueue kernels 
error = clEnqueueNDRangeKernel(*cmdQ, *kernels, 2, 0, glbsize, 0, 0, NULL, NULL);

Am I going the correct way?

Nike
  • 455
  • 1
  • 5
  • 16

1 Answers1

7

It depends on how you actually filled your device array. In case you initialized it correctly, creating the context spanning the devices is correct.

Unfortunately, you have a wrong idea about kernels and command queues. A kernel is created from a program for a particular context. A queue on the other hand is used to communicate with a certain device. What you want to do is create one queue per device not kernel:

for (int i = 0; i < num_devices; i++)
    cmdQ[i] = clCreateCommandQueue(context, device[i], 0, &error);

Now you can enqueue the different (or same) kernels on different devices via the corresponding command queues:

clEnqueueNDRangeKernel(cmdQ[0], kernels[0], /* ... */);
clEnqueueNDRangeKernel(cmdQ[1], kernels[1], /* ... */);

To sum up the terms:

  • A cl_context is created for a particular cl_platform_id and is like a container for a subset of devices,
  • a cl_program is created and built for a cl_context and its associated devices
  • a cl_kernel is extracted from a cl_program but can only be used on devices associated with the program's context,
  • a cl_command_queue is created for a specific device belonging to a certain context,
  • memory operations and kernel calls are enqueued in a command queue and executed on the corresponding device.
matthias
  • 2,161
  • 15
  • 22
  • 1
    Agreed. Also note that different implementations handle distributing workloads across multiple device differently (and sometimes even block clEnqueueNDRangeKernel http://stackoverflow.com/questions/11562543/clenqueuendrange-blocking-on-nvidia-hardware-also-multi-gpu/11562814#comment15294577_11562814) - and in some cases might be less performant. To truly separate processing across two devices and control it all yourself, you can use two contexts, each created with one device. – Ani Jul 25 '12 at 20:50
  • Thank you. Just a quick question, can I not use a single clEnqueueNDRangeKernel statement to start both the kernels? – Nike Jul 25 '12 at 23:43
  • No, not really. But inside an OpenCL program you can call other functions defined in the same program, if you really just want to split up the logic. – matthias Jul 26 '12 at 06:29
  • @matthias :I tried running the kernels on different devices exactly the same way as you suggested. I have even maintained different context for each device. The first kernel executes correctly but I get an "Access violation reading location" eror when the second NDRangeKernel is encountered. I see that memory is allocated correctly and no NULL pointers as such.. Why is this error occurring ? – Nike Jul 27 '12 at 19:24