Bradley-Roth Adaptive Thresholding Algorithm - How do I get better performance?

Question

I have the following code for image thresholding, using the Bradley-Roth image thresholding method.

from PIL import Image
import copy
import time
def bradley_threshold(image, threshold=75, windowsize=5):
    ws = windowsize
    image2 = copy.copy(image).convert('L')
    w, h = image.size
    l = image.convert('L').load()
    l2 = image2.load()
    threshold /= 100.0
    for y in xrange(h):
        for x in xrange(w):
            #find neighboring pixels
            neighbors =[(x+x2,y+y2) for x2 in xrange(-ws,ws) for y2 in xrange(-ws, ws) if x+x2>0 and x+x2<w and y+y2>0 and y+y2<h]
            #mean of all neighboring pixels
            mean = sum([l[a,b] for a,b in neighbors])/len(neighbors)
            if l[x, y] < threshold*mean:
                l2[x,y] = 0
            else:
                l2[x,y] = 255
    return image2

i = Image.open('test.jpg')
windowsize = 5
bradley_threshold(i, 75, windowsize).show()

This works fine when windowsize is small and the image is small. I've been using this image for testing:

I'm experiencing processing times of about 5 or 6 seconds when using a window size of 5, but if I bump my window size up to 20 and the algorithm is checking 20 pixels in each direction for the mean value, i get times upwards of one minute for that image.

If I use an image with a size like 2592x1936 with a window size of only 5, it takes nearly 10 minutes to complete.

So, how can I improve those times? Would a numpy array be faster? Is im.getpixel faster than loading the image into pixel access mode? Are there any other tips for speed boosts? Thanks in advance.

Convert the list expressions to generator expressions in `neighbors` and `sum`, by using parenthesis instead of square brackets, so that you're not allocating a list every time. — Colonel Thirty Two, Oct 12 '15 at 23:40
I'd recommend using `numpy` arrays as well. Especially since you can compute the integral image with only two `cumsum` calls, and it's just a matter of calculating summed areas for each pixel in the image. Here's some MATLAB code I wrote in the past: http://stackoverflow.com/questions/30487127/extract-a-page-from-a-uniform-background-in-an-image/30496377#30496377 and for very large images with relatively large windows, it averages about a few seconds each time. If you'd like, I can rewrite this algorithm using Python NumPy so you can benchmark this yourself or to help you out. — rayryeng, Oct 12 '15 at 23:45
@ColonelThirtyTwo How do I tackle the `len()` calls if neighbors is a generator? `len()` can't be used on generators. @rayryeng I'll check it out. Don't have a ton of experience with numpy, but I'll look at your MATLAB code and see what I can figure out. — Luke Taylor, Oct 12 '15 at 23:50
@rayryeng if you could translate it easily, I'd really appreciate that. I don't know matlab, and although well commented, it's a little confusing. — Luke Taylor, Oct 13 '15 at 00:25
@LukeTaylor - Certainly. I'll set some time aside tonight so I can do a code translation. Give me like half an hour or so. Make sure you have numpy installed though. However, since you have Pillow installed, it's very easy to convert the image into a numpy array, do the processing, then convert back to a PIL image for display. — rayryeng, Oct 13 '15 at 00:52
Wow, thanks a lot. I'm reading through the article you linked in your MATLAB version now. — Luke Taylor, Oct 13 '15 at 00:53
@LukeTaylor - No problem. Have a look at the code. I just finished. Also, the previous link I gave you has a description of the algorithm for better insight. I've also linked it to the post that I wrote for you below. — rayryeng, Oct 13 '15 at 02:00
@LukeTaylor - Cool. Consider accepting one of our answers so that it tells the StackOverflow community that you no longer need help. Good luck! — rayryeng, Oct 16 '15 at 05:33

rayryeng · Accepted Answer · 2019-04-16T19:50:47.363

Referencing our comments, I wrote a MATLAB implementation of this algorithm here: Extract a page from a uniform background in an image, and it was quite fast on large images.

If you'd like a better explanation of the algorithm, please see my other answer here: Bradley Adaptive Thresholding -- Confused (questions). This may be a good place to start if you want a better understanding of the code I wrote.

Because MATLAB and NumPy are similar, this is a re-implementation of the Bradley-Roth thresholding algorithm, but in NumPy. I convert the PIL image into a NumPy array, do the processing on this image, then convert back to a PIL image. The function takes in three parameters: the grayscale image image, the size of the window s and the threshold t. This threshold is different than what you have as this is following the paper exactly. The threshold t is a percentage of the total summed area of each pixel window. If the summed area is less than this threshold, then the output should be a black pixel - else it's a white pixel. The defaults for s and t are the number of columns divided by 8 and rounded, and 15% respectively:

import numpy as np
from PIL import Image

def bradley_roth_numpy(image, s=None, t=None):

    # Convert image to numpy array
    img = np.array(image).astype(np.float)

    # Default window size is round(cols/8)
    if s is None:
        s = np.round(img.shape[1]/8)

    # Default threshold is 15% of the total
    # area in the window
    if t is None:
        t = 15.0

    # Compute integral image
    intImage = np.cumsum(np.cumsum(img, axis=1), axis=0)

    # Define grid of points
    (rows,cols) = img.shape[:2]
    (X,Y) = np.meshgrid(np.arange(cols), np.arange(rows))

    # Make into 1D grid of coordinates for easier access
    X = X.ravel()
    Y = Y.ravel()

    # Ensure s is even so that we are able to index into the image
    # properly
    s = s + np.mod(s,2)

    # Access the four corners of each neighbourhood
    x1 = X - s/2
    x2 = X + s/2
    y1 = Y - s/2
    y2 = Y + s/2

    # Ensure no coordinates are out of bounds
    x1[x1 < 0] = 0
    x2[x2 >= cols] = cols-1
    y1[y1 < 0] = 0
    y2[y2 >= rows] = rows-1

    # Ensures coordinates are integer
    x1 = x1.astype(np.int)
    x2 = x2.astype(np.int)
    y1 = y1.astype(np.int)
    y2 = y2.astype(np.int)

    # Count how many pixels are in each neighbourhood
    count = (x2 - x1) * (y2 - y1)

    # Compute the row and column coordinates to access
    # each corner of the neighbourhood for the integral image
    f1_x = x2
    f1_y = y2
    f2_x = x2
    f2_y = y1 - 1
    f2_y[f2_y < 0] = 0
    f3_x = x1-1
    f3_x[f3_x < 0] = 0
    f3_y = y2
    f4_x = f3_x
    f4_y = f2_y

    # Compute areas of each window
    sums = intImage[f1_y, f1_x] - intImage[f2_y, f2_x] - intImage[f3_y, f3_x] + intImage[f4_y, f4_x]

    # Compute thresholded image and reshape into a 2D grid
    out = np.ones(rows*cols, dtype=np.bool)
    out[img.ravel()*count <= sums*(100.0 - t)/100.0] = False

    # Also convert back to uint8
    out = 255*np.reshape(out, (rows, cols)).astype(np.uint8)

    # Return PIL image back to user
    return Image.fromarray(out)


if __name__ == '__main__':
    img = Image.open('test.jpg').convert('L')
    out = bradley_roth_numpy(img)
    out.show()
    out.save('output.jpg')

The image is read in and converted to grayscale if required. The output image will be displayed, and it will be saved to the same directory where you ran the script to an image called output.jpg. If you want to override the settings, simply do:

out = bradley_roth_numpy(img, windowsize, threshold)

Play around with this to get good results. Using the default parameters and using IPython, I measured the average time of execution using timeit, and this is what I get for your image you uploaded in your post:

In [16]: %timeit bradley_roth_numpy(img)
100 loops, best of 3: 7.68 ms per loop

This means that running this function repeatedly 100 times on the image you uploaded, the best of 3 execution times gave on average 7.68 milliseconds per run.

I also get this image as a result when I threshold it:

score 4 · Answer 2 · answered Oct 12 '15 at 23:52

Profiling your code in IPython with %prun yields shows:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    50246    2.009    0.000    2.009    0.000 <ipython-input-78-b628a43d294b>:15(<listcomp>)
    50246    0.587    0.000    0.587    0.000 <ipython-input-78-b628a43d294b>:17(<listcomp>)
        1    0.170    0.170    2.829    2.829 <ipython-input-78-b628a43d294b>:5(bradley_threshold)
    50246    0.058    0.000    0.058    0.000 {built-in method sum}
    50257    0.004    0.000    0.004    0.000 {built-in method len}

i.e, almost all of the running time is due to Python loops (slow) and non-vectorized arithmetic (slow). So I would expect big improvements if you rewrite using numpy arrays; alternatively you could use cython if you can't work out how to vectorize your code.

Right. Cython isn't an option because I'm actually doing this in pythonista on iOS. I may look into numpy. — Luke Taylor, Oct 13 '15 at 00:00

Lars · Answer 3 · 2016-06-06T12:42:38.070

OK, I am a bit late here. Let me share my thoughts on that anyway:

You could speed it up by using dynamic programming to compute the means but it much easier and faster to let scipy and numpy do all the dirty work. (Note that I use Python3 for my code, so xrange is changed to range in your code).

#!/usr/bin/env python3

import numpy as np
from scipy import ndimage
from PIL import Image
import copy
import time

def faster_bradley_threshold(image, threshold=75, window_r=5):
    percentage = threshold / 100.
    window_diam = 2*window_r + 1
    # convert image to numpy array of grayscale values
    img = np.array(image.convert('L')).astype(np.float) # float for mean precision 
    # matrix of local means with scipy
    means = ndimage.uniform_filter(img, window_diam)
    # result: 0 for entry less than percentage*mean, 255 otherwise 
    height, width = img.shape[:2]
    result = np.zeros((height,width), np.uint8)   # initially all 0
    result[img >= percentage * means] = 255       # numpy magic :)
    # convert back to PIL image
    return Image.fromarray(result)

def bradley_threshold(image, threshold=75, windowsize=5):
    ws = windowsize
    image2 = copy.copy(image).convert('L')
    w, h = image.size
    l = image.convert('L').load()
    l2 = image2.load()
    threshold /= 100.0
    for y in range(h):
        for x in range(w):
            #find neighboring pixels
            neighbors =[(x+x2,y+y2) for x2 in range(-ws,ws) for y2 in range(-ws, ws) if x+x2>0 and x+x2<w and y+y2>0 and y+y2<h]
            #mean of all neighboring pixels
            mean = sum([l[a,b] for a,b in neighbors])/len(neighbors)
            if l[x, y] < threshold*mean:
                l2[x,y] = 0
            else:
                l2[x,y] = 255
    return image2

if __name__ == '__main__':
    img = Image.open('test.jpg')

    t0 = time.process_time()
    threshed0 = bradley_threshold(img)
    print('original approach:', round(time.process_time()-t0, 3), 's')
    threshed0.show()

    t0 = time.process_time()
    threshed1 = faster_bradley_threshold(img)
    print('w/ numpy & scipy :', round(time.process_time()-t0, 3), 's')
    threshed1.show()

That made it much faster on my machine:

$ python3 bradley.py 
original approach: 3.736 s
w/ numpy & scipy : 0.003 s

PS: Note that the mean I used from scipy behaves slightly different at the borders than the one from your code (for positions where the window for mean calculation is not fully contained in he image anymore). However, I think that shouldn't be a problem.

Another minor difference is that the window from the for-loops was not exactly centered at the pixel as the offset by xrange(-ws,ws) with ws=5 yields -5,-4-,...,3,4 and results in an average of -0.5. This probably wasn't intended.

Bradley-Roth Adaptive Thresholding Algorithm - How do I get better performance?

3 Answers3

Linked