2

I have an OpenCL kernel that applies some filter to a grayscale 1920x1080 image, and I would like to apply the same filter to N different images , to fully utilise the GPU what is the best practice of such case that achieves the highest frames per second ?

I want to eliminate the overhead of running the kernel and then switch the control back to the CPU and then launch the same kernel again with a different image.

talonmies
  • 70,661
  • 34
  • 192
  • 269
mmain
  • 333
  • 3
  • 19
  • Have you read: http://stackoverflow.com/questions/6500905/techniques-to-reduce-cpu-to-gpu-data-transfer-latency and http://stackoverflow.com/questions/9287346/does-amds-opencl-offer-something-similar-to-cudas-gpudirect – Morrison Chang Apr 25 '16 at 05:32
  • You can enqueue more than one kernel at a time; the GPU will churn through as fast as it can. You won't be idling the GPU waiting for work. As pointed out by the previous commenter, you can overlap image upload with compute by using multiple command queues synchronized with events. – Dithermaster Apr 25 '16 at 22:47
  • @Dithermaster do you mean enqueue the same kernel multiple times with different image per enqueue ?! as I only have one kernel – mmain May 03 '16 at 14:07

0 Answers0