I have an OpenCL kernel that applies some filter to a grayscale 1920x1080 image, and I would like to apply the same filter to N different images , to fully utilise the GPU what is the best practice of such case that achieves the highest frames per second ?
I want to eliminate the overhead of running the kernel and then switch the control back to the CPU and then launch the same kernel again with a different image.