0

I am trying to read numbers from images and cannot find a way to get it to work consistently (not all images have numbers). These are the images:

example 1 example 2 example 3 example 4 example 5

(here is the link to the album in case the images are not working)

This is the command I'm using to run tesseract on the images: pytesseract.image_to_string(image, timeout=2, config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789'). I have tried multiple configurations, but this seems to work best.

As far as preprocessing goes, this works the best:

    gray = cv2.cvtColor(np.array(img), cv2.COLOR_RGB2GRAY)
    gray = cv2.bilateralFilter(gray, 11, 17, 17)
    im_bw = cv2.threshold(gray, thresh, 255, cv2.THRESH_BINARY_INV)[1]

This works for all images except the 3rd one. To solve the problem of lines in the 3rd image, i tried getting the edges with cv2.Canny and a pretty large threshold which works, but when drawing them back, even though it gets more than 95% of each number's edges, tesseract does not read them correctly.

I have also tried resizing the image, using cv2.morphologyEx, blurring it etc. I cannot find a way to get it to work for each case.

Thank you.

vibe11
  • 66
  • 6

1 Answers1

0

cv2.resize has consistently worked for me with INTER_CUBIC interpolation.

Adding this last step to pre-processing would most likely solve your problem.

im_bw_scaled = cv2.resize(im_bw, (0, 0), fx=4, fy=4, interpolation=cv2.INTER_CUBIC)

You could play around with the scale. I have used '4' above.

EDIT:

The following code worked with your images very well, even special characters. Please try it out with the rest of your dataset. Scaling, OTSU and erosion was the best combination.

import cv2
import numpy
import pytesseract

pytesseract.pytesseract.tesseract_cmd = "<path to tesseract.exe>"

# Page segmentation mode, PSM was changed to 6 since each page is a single uniform text block.
custom_config = r'--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789'

# load the image as grayscale
img = cv2.imread("5.png",cv2.IMREAD_GRAYSCALE)

# Change all pixels to black, if they aren't white already (since all characters were white)
img[img != 255] = 0

# Scale it 10x
scaled = cv2.resize(img, (0,0), fx=10, fy=10, interpolation = cv2.INTER_CUBIC)

# Retained your bilateral filter
filtered = cv2.bilateralFilter(scaled, 11, 17, 17)

# Thresholded OTSU method
thresh = cv2.threshold(filtered, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]

# Erode the image to bulk it up for tesseract
kernel = numpy.ones((5,5),numpy.uint8)
eroded = cv2.erode(thresh, kernel, iterations = 2)

pre_processed = eroded

# Feed the pre-processed image to tesseract and print the output.
ocr_text = pytesseract.image_to_string(pre_processed, config=custom_config)
if len(ocr_text) != 0:
    print(ocr_text)
else: print("No string detected")
  • Thanks! What threshold did you use? – vibe11 Jul 17 '20 at 22:03
  • Using the otsu method works well with a range (min and Max limits) of threshold values and letting the method find the best threshold. You will find a good explanation here: https://stackoverflow.com/a/23260699 – Dheeraj Mohan Jul 19 '20 at 04:47
  • 1
    Thanks! Using your code and adding a custom threshold at 230 worked for all my test cases. – vibe11 Jul 22 '20 at 21:07