I have a project, which captures screenshots using a Microsoft Desktop Duplication Api and processes them using a set of OpenCL kernels in realtime. Screenshot itself never gets transfered to the host(CPU). It is a console application.
I've ran into some portability issues with Nvidia OpenCL runtime. Microsoft Desktop Duplication Api result/screenshot uses DXGI_FORMAT_B8G8R8A8_UNORM format and Nvidia implementation doesn't support it. Only DXGI_FORMAT_R8G8B8A8_UNORM is supported.