GPU programming has similar concepts to DMIP function composition on CPU.
Although it's not easy to be "automated" on GPU (some 3rd party lib may be able to do it), manually doing it is easier than CPU programming (see Thrust example below).
Two main features of DMIP:
- processing by the image fragments so data can fit into cache;
- parallel processing to different fragments or execution of different independent branches of a graph.
When applying a sequence of basic operations on a large image,
feature 1 omits RAM read/write between basic operations. All the read/write is done in cache, and feature 2 can utilize multi-core CPU.
The similar concept of DMIP feature 1 for GPGPU is kernel fusion. Instead of applying multiple kernels of the basic operation to image data, one could combine basic operations in one kernel to avoid multiple GPU global memory read/write.
A manually kernel fusion example using Thrust can be found in page 26 of this slides.
It seems the library ArrayFire has made notable efforts on automaic kernel fusion.
The similar concept of DMIP feature 2 for GPGPU is concurrent kernel execution.
This feature enlarge the bandwidth requirement, but most GPGPU programs are already bandwidth bound. So the concurrent kernel execution is not likely to be used very often.
CPU cache vs. GPGPU shared mem/cache
CPU cache omits the RAM read/write in DMIP, while for a fused kernel of GPGPU, registers do the same thing. Since a thread of CPU in DMIP processes on a small image fragment, but a thread of GPGPU often processes only one pixel. A few registers are large enough to buffer the data of a GPU thread.
For image processing, GPGPU shared mem/cache is often used when the result pixel depends on surrounding pixels. Image smoothing/filtering is a typical example requiring GPGPU shared mem/cache.