0

So I am working on a project in which it is necessary to read characters off of license plates. Given an image of (just) the license plate I'm using openCV to segment the characters and get their bounding boxes. Then the individual characters are cut out and I'd like to use Tesseract to recognize what the characters are.

Problem is: I'm getting really bad results, even though the characters seem perfectly cut out by openCV. I've included some example images below. Tesseract either fails to detect any character at all, or detects entirely wrong characters (I don't mean it confuses a 0 with an O, or 1 and l...it, detects 7, as an example, if there is a 4 clearly visible).

enter image description here enter image description here

Is there anything I am doing wrong, or have I misunderstood the options I am setting? Help would be greatly appreciated, as I'm not seeing why Tesseract shouldn't recognize these characters.

(I'm using Tesseract OCR v4, in the LSTM mode)

Joseph Adams
  • 972
  • 1
  • 6
  • 19

1 Answers1

0

You can recognize by pytesseract in two-steps



    1. Adaptive-threshold

Here, the algorithm determines the threshold for a pixel based on a small region around it. So we get different thresholds for different regions of the same image which gives better results for images with varying illumination. source

enter image description here enter image description here
Adaptive-threshold result below Adaptive-threshold result below
enter image description here enter image description here
pytesseract result below pytesseract result below
4 9

Code:


import cv2
import pytesseract

img_lst = ["four.png", "nine.png"]

for pth in img_lst:
    img = cv2.imread(pth)
    img = cv2.resize(img, (28, 28))
    gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                                cv2.THRESH_BINARY_INV, 47, 2)
    txt = pytesseract.image_to_string(thr, config="--psm 6 digits")
    print(txt)
Ahmet
  • 7,527
  • 3
  • 23
  • 47