Efficient processing of pixel + neighborhood in numpy image

Question

I have a range image of a scene. I traverse the image and calculate the average change in depth under the detection window. The detection windows changes size based on the average depth of the surrounding pixels of the current location. I accumulate the average change to produce a simple response image.

Most of the time is spent in the for loop, it is taking about 40+s for a 512x52 image on my machine. I was hoping for some speed up. Is there a more efficient/faster way to traverse the image? Is there a better pythonic/numpy/scipy way to visit each pixel? Or shall I go learn cython?

EDIT: I have reduced running time to about 18s by using scipy.misc.imread() instead of skimage.io.imread(). Not sure what the difference is, I will try to investigate.

Here is a simplified version of the code:

import matplotlib.pylab as plt
import numpy as np
from skimage.io import imread
from skimage.transform import integral_image, integrate
import time

def intersect(a, b):
    '''Determine the intersection of two rectangles'''
    rect = (0,0,0,0)
    r0 = max(a[0],b[0])
    c0 = max(a[1],b[1])
    r1 = min(a[2],b[2])
    c1 = min(a[3],b[3])
    # Do we have a valid intersection?
    if r1 > r0 and  c1 > c0: 
         rect = (r0,c0,r1,c1)
    return rect

# Setup data
depth_src = imread("test.jpg", as_grey=True)
depth_intg = integral_image(depth_src)   # integrate to find sum depth in region
depth_pts = integral_image(depth_src > 0)  # integrate to find num points which have depth
boundary = (0,0,depth_src.shape[0]-1,depth_src.shape[1]-1) # rectangle to intersect with

# Image to accumulate response
out_img = np.zeros(depth_src.shape)

# Average dimensions of bbox/detection window per unit length of depth
model = (0.602,2.044)  # width, height

start_time = time.time()
for (r,c), junk in np.ndenumerate(depth_src):
    # Find points around current pixel      
    r0, c0, r1, c1 = intersect((r-1, c-1, r+1, c+1), boundary)

    # Calculate average of depth of points around current pixel
    scale =  integrate(depth_intg, r0, c0, r1, c1) * 255 / 9.0 

    # Based on average depth, create the detection window
    r0 = r - (model[0] * scale/2)
    c0 = c - (model[1] * scale/2)
    r1 = r + (model[0] * scale/2)
    c1 = c + (model[1] * scale/2)

    # Used scale optimised detection window to extract features
    r0, c0, r1, c1 = intersect((r0,c0,r1,c1), boundary)
    depth_count = integrate(depth_pts,r0,c0,r1,c1)
    if depth_count:
         depth_sum = integrate(depth_intg,r0,c0,r1,c1)
         avg_change = depth_sum / depth_count
         # Accumulate response
         out_img[r0:r1,c0:c1] += avg_change
print time.time() - start_time, " seconds"

plt.imshow(out_img)
plt.gray()
plt.show()

score 4 · Accepted Answer · edited May 23 '17 at 12:27

Michael, interesting question. It seems that the main performance problem you have is that each pixel in the image has two integrate() functions computed on it, one of size 3x3 and the other of a size which is not known in advance. Calculating individual integrals in this way is extremely inefficient, regardless of what numpy functions you use; it's an algorithmic issue, not an implementation issue. Consider an image of size NN. You can calculate all integrals of any size KK in that image using only approximately 4*NN operations, not (as one might naively expect) NNKK. The way you do that is first calculate an image of sliding sums over a window K in each row, and then sliding sums over the result in each column. Updating each sliding sum to move to the next pixel requires only adding the newest pixel in the current window and subtracting the oldest pixel in the previous window, thus two operations per pixel regardless of window size. We do have to do that twice (for rows and columns), therefore 4 operations per pixel.

I am not sure if there is a sliding window sum built into numpy, but this answer suggests a couple of ways to do it, using stride tricks: https://stackoverflow.com/a/12713297/1828289. You can certainly accomplish the same with one loop over columns and one loop over rows (taking slices to extract a row/column).

Example:

# img is a 2D ndarray
# K is the size of sums to calculate using sliding window
row_sums = numpy.zeros_like(img)
for i in range( img.shape[0] ):
    if i > K:
        row_sums[i,:] = row_sums[i-1,:] - img[i-K-1,:] + img[i,:]
    elif i > 1:
        row_sums[i,:] = row_sums[i-1,:] + img[i,:]
    else: # i == 0
        row_sums[i,:] = img[i,:]

col_sums = numpy.zeros_like(img)
for j in range( img.shape[1] ):
    if j > K:
        col_sums[:,j] = col_sums[:,j-1] - row_sums[:,j-K-1] + row_sums[:,j]
    elif j > 1:
        col_sums[:,j] = col_sums[:,j-1] + row_sums[:,j]
    else: # j == 0
        col_sums[:,j] = row_sums[:,j]

# here col_sums[i,j] should be equal to numpy.sum(img[i-K:i, j-K:j]) if i >=K and j >= K
# first K rows and columns in col_sums contain partial sums and can be ignored

How do you best apply that to your case? I think you might want to pre-compute the integrals for 3x3 (average depth) and also for several larger sizes, and use the value of the 3x3 to select one of the larger sizes for the detection window (assuming I understand the intent of your algorithm). The range of larger sizes you need might be limited, or artificially limiting it might still work acceptably well, just pick the nearest size. Calculating all integrals together using sliding sums is so much more efficient that I am almost certain it is worth calculating them for a lot of sizes you would never use at a particular pixel, especially if some of the sizes are large.

P.S. This is a minor addition, but you may want to avoid calling intersect() for every pixel: either (a) only process pixels which are farther from the edge than the max integral size, or (b) add margins to the image of the max integral size on all sides, filling the margins with either zeros or nans, or (c) (best approach) use slices to take care of this automatically: a slice index outside the boundary of an ndarray is automatically limited to the boundary, except of course negative indexes are wrapped around.

EDIT: added example of sliding window sums

Thanks. Computing the 3x3 window before the loop seems obvious now, thanks for the suggestion. I will investigate the sliding sums. I still learning python/numpy etc and haven't used strides, this gives me a good reason. I do some timings and report back. Thanks again. — Michael, Nov 22 '12 at 13:35
@Michael, I added an example of sliding window sums, have a look and give it a try. — Alex I, Nov 27 '12 at 08:54
Opps... forgot to accept (fixed now). Initially couldn't get my head around the stride tricks. Your example helped. Thanks. — Michael, May 02 '13 at 02:18

Efficient processing of pixel + neighborhood in numpy image

1 Answers1