The Haar cascade classifier uses sliding window approach with pyramid to detect objects. For me it takes about 0.01s to detect objects in an image. However my question is that how it can be so fast while uses sliding window approach? (I implemented a CNN for detect object which used sliding window for detect objects with no pyramids, although it took 2 seconds to detect objects). I want to know what are the tricks to run sliding window approach faster? I used two loops for sliding whole image with some strides and also made it parallel, but it is still much slower than OpenCV implementation.
-
1are you using python loops? most of the OpenCV methods are run in C++, and a lot of them are heavily optimized (SSE, OMP, etc). Also, python loops are quite slow... normally you try to avoid them and use something optimized like numpy (which is precompiled) – api55 Oct 19 '17 at 13:33
-
Thanks for your comment, Is it possible to split image into overlapping slices using numpy in one comment and store them in sub numpy arrays? – Panda Oct 19 '17 at 14:12
-
[here is a link](https://stackoverflow.com/questions/15722324/sliding-window-in-numpy) with a way to do that with numpy. Also if you want to do it in pure python, you can try to convert it to c and compile with [cython](http://cython.org/). – api55 Oct 19 '17 at 14:25
1 Answers
The quickest way (in my experience) is to use the numpy.lib.stride_tricks.as_strided
function. Effectively what we do is first use the numpy function to generate and store all of the patches (sliding window positions) in one big array. Then we can just map that array to our function.
First, define the shape which is defined as (image height, image width, kernel height, kernel width). Then you can stride across the bits of the image (i.e. 8bit image each pixel is an 8bit stride). In this case the patches will be a repeat of the strides of the image twice. You can check the stride with img.strides
.
def some_func(roi):
'''
simple function to return the mean of the region
of interest
'''
return np.mean(roi)
img = np.zeros((30000,30000), dtype=np.uint8)
img_shape = img.shape
size = 3 # window size i.e. here is 3x3 window
shape = (img.shape[0] - size + 1, img.shape[1] - size + 1, size, size)
strides = 2 * img.strides
patches = np.lib.stride_tricks.as_strided(img, shape=shape, strides=strides)
patches = patches.reshape(-1, size, size)
output_img = np.array([some_func(roi) for roi in patches])
output_img.reshape(img_size)
There are other increases you could do like vectorizing your function np.vectorize()
in certain cases. If you wanted to calculate the mean you could have also just used output_img = patches.mean(axis=(-1, -2))
and avoid the need to map to a function, or the need to reshape. There are also potentially quicker ways to map an array to a function see this post. I've given this solution as any procedure can be added into the function and the question seemed pretty general.

- 2,248
- 3
- 16
- 30
-
-
-
There are issues in this approach mentioned in the Notes section of the documentation: https://numpy.org/doc/stable/reference/generated/numpy.lib.stride_tricks.as_strided.html and they advised to avoid `as_strided` when possible. – hafiz031 Jun 13 '23 at 00:18