9

I'm looking to perform optical character recognition (OCR) on a display, and want the program to work under different light conditions. To do this, I need to process and threshold the image such that there is no noise surrounding each digit, allowing me to detect the contour of the digit and perform OCR from there. I need the threshold value I use to be adaptable to these different light conditions. I've tried adaptive thresholding, but I haven't been able to get it to work.

My image processing is simple: load the image (i), grayscale i (g), apply a histogram equalization to g (h), and apply a binary threshold to h with a threshold value = t. I've worked with a couple of different datasets, and found that the optimal threshold value to make the OCR work consistently lies within the range of highest density in a histogram plot of (h) (the only part of the plot without gaps). A histogram of (h). The values t=[190,220] are optimal for OCR

A histogram of (h). The values t=[190,220] are optimal for OCR. A more complete set of images describing my problem is available here: https://i.stack.imgur.com/qsFp0.jpg

My current solution, which works but is clunky and slow, checks for:

    1. There must be 3 digits
    2. The first digit must be reasonably small in size
    3. There must be at least one contour recognized as a digit
    4. The digit must be recognized in the digit dictionary

Barring all cases being accepted, the threshold is increased by 10 (beginning at a low value) and an attempt is made again.

The fact that I can recognize the optimal threshold value on the histogram plot of (h) may just be confirmation bias, but I'd like to know if there's a way I can extract the value. This is different from how I've worked with histograms before, which has been more on finding peaks/valleys.

I'm using cv2 for image processing and matplotlib.pyplot for the histogram plots.

Jazz
  • 916
  • 1
  • 8
  • 22
bhenders
  • 123
  • 2
  • 6

4 Answers4

3

Check this: link it really not depend on density, it works because you did separation of 2 maximums. Local maximums are main classes foreground - left local maximum (text pixels), and background right local maximum (white paper). Optimal threshold should optimally separate these maximums. And the optimal threshold value lies in local minimum region between two local maximums.

Andrey Smorodov
  • 10,649
  • 2
  • 35
  • 42
2

At first, I thought "well, just make a histogram of the indexes in which data appears" which would totally work, but I don't think that will actually solve your underlying work you want to do.

I think you're misinterpreting histogram equalization. What histogram equalization does is thins out the histogram in highly concentrated areas so that if you take different bin sizes with the histogram, you'll get more or less equal quantity inside the bins. The only reason those values are dense is specifically because they appear less in the image. Histogram equalization makes other, more popular values, appear less. And the reason that range works out well is, as you see in the original grayscale histogram, values between 190 and 220 are really close to where the image begins to get bright again; i.e., where there is a clear demarkation of bright values.

You can see the way equalizeHist works directly by plotting histograms with different bin sizes. For example, here's looping over bin sizes from 3 to 20:

Multiple hist values

Edit: So just to be clear, what you want is this demarked area between the lower bump and the higher bump in your original histogram. You don't need to use equalized histograms for this. In fact, this is what Otsu thresholding (following Otsu's method) actually does: you assume the data follows a bimodal distribution, and find the point which clearly marks the point between the two distributions.

alkasm
  • 22,094
  • 5
  • 78
  • 94
1

Basically, what you're asking is to find the indexes of the longest sequence of non-zero element in a 256 x 1 array.

Based on this answer, you should get what you want like this :

import cv2
import numpy as np

# load in grayscale

img = cv2.imread("image.png",0)
hist = cv2.calcHist([img],[0],None,[256],[0,256])

non_zero_sequences = np.where(np.diff(np.hstack(([False],hist!=0,[False]))))[0].reshape(-1,2)
longest_sequence_id = np.diff(non_zero_sequences,axis=1).argmax()
longest_sequence_start = non_zero_sequences[longest_sequence_id,0]
longest_sequence_stop = non_zero_sequences[longest_sequence_id,1]

Note that it is untested.

Lescurel
  • 10,749
  • 16
  • 39
0

I would also recommend to use an automatic thresholding method like the Otsu's method (here a nice explanation of the method).

In Python OpenCV, you have this tutorial that explains how to do Otsu's binarization.

If you want to experiment other automatic thresholding methods, you can look at the ImageJ / Fiji software. For instance, this page summarizes all the methods implemented.


Grayscale image:

Grayscale

Results:

Results

If you want to reimplement the methods, you can check the source code of the Auto_Threshold plugin. I used Fiji for this demo.

Catree
  • 2,477
  • 1
  • 17
  • 24