Is there any way to speed up my Daugman's operator function on numpy?

Question

Few days ago I tried to find python implementation of Daugman's Iris detection algorithm. I found only one repo on github, but implementation was very slow, around 46 sec on my laptop. So, I rewrote all stuff, according to this formulae:
Final result is ~85 times faster (~530 ms on my laptop), but it is still not fast enough fo real-time video processing (at 10 fps, at least).

I've read this stackoverflow topics 1,2 and tried to vectorize the code. I've tested map(), np.vectorize(), np.fromiter(), but they were no faster than my current solution (some of them were even slower) and I am out of ideas.

Is there a way to vectorize code below or I need to use C extensions or try to run it on PyPy?

My solution:

def daugman(center, start_r, gray_img):
    """return maximal intense radius for given center
    center -- tuple(x, y)
    start_r -- int
    gray_img -- grayscale picture as np.array(), it should be square
    """
    # get separate coordinates
    x, y = center
    # get img dimensions
    h, w = gray_img.shape
    # for calculation convinience
    img_shape = np.array([h, w])
    c = np.array(center)
    # define some other vars
    tmp = []
    mask = np.zeros_like(gray)

    # for every radius in range
    # we are presuming that iris will be no bigger than 1/3 of picture
    for r in range(start_r, int(h/3)):
        # draw circle on mask
        cv2.circle(mask, center, r, 255, 1)
        # get pixel from original image
        radii = gray_img & mask  # it is faster than np or cv2
        # normalize
        tmp.append(radii[radii > 0].sum()/(2*3.1415*r))
        # refresh mask
        mask.fill(0)

    # calculate delta of radius intensitiveness
    tmp = np.array(tmp)
    tmp = tmp[1:] - tmp[:-1]
    # aply gaussian filter
    tmp = abs(cv2.GaussianBlur(tmp[:-1], (1, 5), 0))
    # get maximum value
    idx = np.argmax(tmp)
    # return value, center coords, radius
    return tmp[idx], [center, idx + start_r]

def find_iris(gray, start_r):
    """Apply daugman() on every pixel in calculated image slice
        gray -- graysacale img as np.array()
        start_r -- initial radius as int
    Selection of image slice guarantees that every
    radius will be drawn in iage borders, so we need to check it (speed up)

    To speed up the whole process we need to pregenerate all centers for detection
    """
    _, s = gray.shape
    # reduce step for better accuracy
    # 's/3' is the maximum radius of a daugman() search
    a = range(0 + int(s/3), s - int(s/3), 3)
    all_points = list(itertools.product(a, a))

    values = []
    coords = []

    for p in all_points:
        tmp = daugman(p, start_r, gray)
        if tmp is not None:
            val, circle = tmp
            values.append(val)
            coords.append(circle)

    # return the radius with biggest intensiveness delta on image
    # [(xc, yc), radius]
    return coords[np.argmax(values)]

UPD: image for testing

UPD 2: I've tried to create a tensor with all possible x,y,r values and map function over it -- it is not faster.

Tensor creation:

# prepare img
img = cv2.imread('eye.jpg')
img = img[20:130, 20:130]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
start_r = 10
# prepare some vars
h, w = gray.shape
img_shape = np.array([h, w])
mask = np.zeros_like(gray)
# generate points
_, s = gray.shape
a = range(0 + int(s/3), s - int(s/3), 3)
b = range(start_r, int(s/3))
all_points = list(itertools.product(a, a, b))
all_points_arr = np.array(all_points)

Rewritten daugman for such case:

def daugman_6(point, gray_img=gray):
    """return maximal intense radius for given center
    center -- tuple(x, y)
    start_r -- int
    gray_img -- grayscale picture as np.array(), it should be square
    """
    # get separate coordinates
    x, y, r = point

    # for every radius in range
    # we are presuming that iris will be no bigger than 1/3 of picture
    # draw circle on mask
    cv2.circle(mask, (x, y), r, 255, 1)
    # get pixel from original image
    radii = gray_img & mask  # it is faster than np or cv2
    # refresh mask
    mask.fill(0)

    return radii[radii > 0].sum()/(2*3.1415*r)

Results on server with Core i9:

# iterate via numpy array with for
%%timeit
[daugman_6(i) for i in all_points_arr]
#80.6 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# np.fromiter() on numpy array
%%timeit
np.fromiter((daugman_6(p) for p in all_points_arr), dtype=np.float32)
#82.9 ms ± 3.75 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# iteration over list of tuples
%%timeit
[daugman_6(i) for i in all_points]
#70 ms ± 2.86 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# original implementation
%%timeit
find_iris(gray, 10)
#71.6 ms ± 3.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Currently you are looping over segments, and then per segment, are looping over radii to do your comparison. Maybe you can segment your image into a tensor, and broadcast the different circle masks over all slices at once. — Nils Werner, Oct 10 '18 at 09:44
I've tried to create numpy array with all possible x,y,r variations and do something like `[daugman(i) for i in all_points]` or use `np.fromiter()` on it, but there was no speed up :( — banderlog013, Oct 10 '18 at 10:10

Is there any way to speed up my Daugman's operator function on numpy?

0 Answers0