How to detect text using Tesseract on images with poor camera angles?

Question

I'm working on extracting text on images, that are similar to the one shown below: Warehouse boxes with all kinds of different labels. Images often have poor angles.

My code:

im = cv2.imread('1.jpg')
config = ('-l eng --oem 1 --psm 3')
text = pytesseract.image_to_string(im, config=config)
text_list = text.split('\n')
# remove blanks of varying sizes so that only words are returned
space_to_empty = [x.strip() for x in text_list]
space_clean_list = [x for x in space_to_empty if x]
print(space_clean_list)

For example, that image

here

returns an output of

['L2 Sy', "////’7/'7///////////////"]

on all variations of --oem and --psm values.

Perspective correction for the image

here

gives a slightly better output (though still poor) of

['R19 159 942 sEMY', 'V/ ////////////////////I////I/////////////']

again, on all variations of --oem and --psm values.

My questions are:

Why does Tesseract seem to perform so badly on such images with poor perspectives, compared to other alternatives like Vision API and PaddleOCR which are able to extract text fairly well. Is this an issue that can be corrected through some sort of fine-tuning in Tesseract? Or is this a weak point of Tesseract that has to be addressed with preprocessing (such as blurring, threshold, etc)? If that is the case, the alternate solutions above seem better as they do not require such preprocessing.
Despite changing the values for --oem and --psm as shown here, the output stays the same. Is this expected?

Question 1: That's a debate on what OCR system is better for which use case, environment, etc. Such discussions are discouraged on StackOverflow, at least within regular questions. Question 2: Running `text = pytesseract.image_to_string(img, config='--psm 6')` on your second image at least gives the proper code `RR19 159 942 5MY` (next to some rubbish from the barcode). On the other hand, for example, `--psm 11` gives totally different results. So, at least I can't reproduce "everything's the same for all `--psm` and `--oem`". — HansHirse, Jun 23 '21 at 12:27

score 2 · Answer 1 · answered Jun 23 '21 at 12:18

Your perspective correction is insufficient. Unfortunately, you haven't provided code on that, so I will present my full solution.

Mask the label in the image using thresholding, some morphological operations, contour finding, and extracting the central contour, assuming the label is (always) located in the center of the image.
Find the extreme outer points of the contour.
Properly perform the perspective transform of the label to some upright rectangle.
Run pytesseract with --psm 6 option.

That'd be the full code:

import cv2
import numpy as np
import pytesseract

# Read image
img = cv2.imread('input.jpg')
h, w = img.shape[:2]

# Mask label
mask = np.all(img > 240, axis=2).astype(np.uint8) * 255
mask = cv2.morphologyEx(mask, cv2.MORPH_OPEN, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5)))
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (21, 21)))
cnts = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
cnts = [cnt for cnt in cnts if cv2.pointPolygonTest(cnt, (w // 2, h // 2), False) > 0]
mask = cv2.drawContours(np.zeros_like(mask), cnts, -1, 255, cv2.FILLED)

# Find extreme outer points of label
# https://stackoverflow.com/a/56801276/11089932
x, y, w, h = cv2.boundingRect(mask)
l = (x, np.argmax(mask[:, x]))
r = (x+w-1, np.argmax(mask[:, x+w-1]))
t = (np.argmax(mask[y, :]), y)
b = (np.argmax(mask[y+h-1, :]), y+h-1)

# Perspective transform of label
# https://stackoverflow.com/a/65990763/11089932
bw, bh = [400, 200]
pts1 = np.float32([t, l, b, r])
pts2 = np.float32([[0, 0], [0, bh-1], [bw-1, bh-1], [bw-1, 0]])
M = cv2.getPerspectiveTransform(pts1, pts2)
warped = cv2.warpPerspective(img, M, (bw, bh))

# Raw OCR on transformed label
text = pytesseract.image_to_string(warped, config='--psm 6')
print(text.replace('\f', ''))
# POS | Registered
# R RR19 159 942 5MY
# WU UAV UMBRUE OE RT

As you can see, already the raw OCR is quite good. You're free to further pre-process the warped image to cut out the header, the barcode, and so on.

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.2
NumPy:         1.20.3
OpenCV:        4.5.2
pytesseract:   5.0.0-alpha.20201127
----------------------------------------

How to detect text using Tesseract on images with poor camera angles?

1 Answers1