I am trying to do performance optimization on my code which does image processing. For example, unsharp masking. It applies a calculation on a square region around each pixel of the image, in raster order.
I want to check whether copying several lines of the image to a dedicated "work area", while bypassing the cache, will help. The idea is, data from the image will not evict other useful data from the cache, which should improve performance.
How can I implement a special form of memcpy
, which doesn't update the cache?
I don't use OpenCV, but if it has such support, I am ready to try it.
I don't want to mark the whole image as an uncached area, because I have many algorithms running on it, and I want to measure the effect of my optimization attempt on just one algorithm.