How to automatically select best result from try_all_threshold?

Question

I am applying thresholding on a text-digit based image. Using skimage.filters.try_all_threshold results in 7 of thresholding algorithms getting applied. I am able to get the resut but I am thinking on how I can choose only 1 result to pass the result to next process/dynamically choose 1 best result.

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

You need to define a measure of similarity between the original image and the binarized images, and then select the thresholding method that maximizes that measure.

Demo

The following code simply aims at putting you on the right track. Notice that the function similarity returns a random number rather than a sensible similarity measure. You should implement it on your own or replace it by an appropriate function.

import numpy as np
from skimage.data import text
import skimage.filters
import matplotlib.pyplot as plt

threshold_methods = [skimage.filters.threshold_otsu,
                     skimage.filters.threshold_yen,
                     skimage.filters.threshold_isodata,
                     skimage.filters.threshold_li,
                     skimage.filters.threshold_mean,
                     skimage.filters.threshold_minimum,
                     skimage.filters.threshold_mean,
                     skimage.filters.threshold_triangle,
                     ]

def similarity(img, threshold_method):
    """Similarity measure between the original image img and and the
    result of applying threshold_method to it.
    """
    return np.random.random()

results = np.asarray([similarity(text(), f) for f in threshold_methods])    
best_index = np.nonzero(results == results.min())[0][0]    
best_method = thresholding_methods[best_index]
threshold = best_method(text())
binary = text() >= threshold

fig, ax = plt.subplots(1, 1)
ax.imshow(binary, cmap=plt.cm.gray)
ax.axis('off')
ax.set_title(best_method.__name__)
plt.show(fig)

Edit

Obviously, it makes nonsense to choose the thresholding method randomly (as I did in the toy example above). Instead, you should implement a similarity measure which allows you to automatically select the most efficient algorithm. One possible way to do so would consist in computing the misclassification error, i.e. the percentage of background pixels wrongly assigned to foreground, and conversely, foreground pixels wrongly assigned to background. As the misclassification error is a disimilarity measure rather than a similarity measure, you have to select the method that minimizes that measure like this:

best_index = np.nonzero(results == results.min())[0][0]

Take a look at this paper for details on this and other approaches to thresholding performance assessment.

can i define `measure of similarity ` as any random value ? when i am saving the binarized image it's coming as a whole black image am i doing something wrong ? — Prabhat Mishra, Apr 15 '18 at 13:06

How to automatically select best result from try_all_threshold?

1 Answers1

Demo

Edit