Adjusting pytesseract parameters

Question

Note: I am migrating this question from Data Science Stack Exchange, where it received little exposure.

I am trying to implement an OCR solution to identify the numbers read from the picture of a screen.

I am adapting this pyimagesearch tutorial to my problem.

Because I am dealing with a dark background, I first invert the image, before converting it to grayscale and thresholding it:

inverted_cropped_image = cv2.bitwise_not(cropped_image)
gray = get_grayscale(inverted_cropped_image)
thresholded_image = cv2.threshold(gray, 100, 255, cv2.THRESH_BINARY)[1]

Then I call pytesseract's image_to_data function to output a dictionary containing the different text regions and their confidence intervals:

from pytesseract import Output
results = pytesseract.image_to_data(thresholded_image, output_type=Output.DICT)

Finally I iterate over results and plot them when their confidence exceeds a user defined threshold (70%). What bothers me, is that my script identifies everything in the image except the number that I would like to recognize (1227.938).

My first guess is that the image_to_data parameters are not set properly.

Checking this website, I selected a page segmentation mode (psm) of 11 (sparse text) and tried whitelisting numbers only (tessedit_char_whitelist=0123456789m.'):

results = pytesseract.image_to_data(thresholded_image, config='--psm 11 --oem 3 -c tessedit_char_whitelist=0123456789m.', output_type=Output.DICT)

Alas, this is even worse, and the script now identifies nothing at all!

Do you have any suggestion? Am I missing something obvious here?

EDIT #1:

At Ann Zen's request, here's the code used to obtain the first image:

import imutils
import cv2
import matplotlib.pyplot as plt
import numpy as np
import pytesseract
from pytesseract import Output

def get_grayscale(image):
    return cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

filename = "IMAGE.JPG"
cropped_image = cv2.imread(filename)
inverted_cropped_image = cv2.bitwise_not(cropped_image)

gray = get_grayscale(inverted_cropped_image)

thresholded_image = cv2.threshold(gray, 100, 255, cv2.THRESH_BINARY)[1]

results = pytesseract.image_to_data(thresholded_image, config='--psm 11 --oem 3 -c tessedit_char_whitelist=0123456789m.', output_type=Output.DICT)

color = (255, 255, 255)
for i in range(0, len(results["text"])):
    x = results["left"][i]
    y = results["top"][i]
    w = results["width"][i]
    h = results["height"][i]
    text = results["text"][i]
    conf = int(results["conf"][i])
    print("Confidence: {}".format(conf))
    if conf > 70:
        print("Confidence: {}".format(conf))
        print("Text: {}".format(text))
        print("")
        text = "".join([c if ord(c) < 128 else "" for c in text]).strip()
        cv2.rectangle(cropped_image, (x, y), (x + w, y + h), color, 2)
        cv2.putText(cropped_image, text, (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX,1.2, color, 3)
cv2.imshow('Image', cropped_image)
cv2.waitKey(0)

EDIT #2:

Rarely have I spent reputation points so well! All three replies posted so far helped me refine my algorithm.

First, I wrote a Tkinter program allowing me to manually crop the image around the number of interest (modifying the one found in this SO post)

Then I used Ann Zen's idea of narrowing down the search area around the fractional part. I am using her nifty process function to prepare my grayscale image for contour extraction: contours, _ = cv2.findContours(process(img_gray), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE). I am using RETR_EXTERNAL to avoid dealing with overlapping bounding rectangles.

I then sorted my contours from left to right. Bounding rectangles exceeding a user-defined threshold are associated with the integral part (white rectangles); otherwise they are associated with the fractional part (black rectangles).

I then extracted the characters using Esraa's approach i.e. applying a Gaussian blur prior to calling Tesseract. I used a much larger kernel (15x15 vs 3x3) to achieve this.

I am not out of the woods yet, but hopefully I will get better results by using Ahx's adaptive thresholding.

Please post the full code you used to obtain the first image in your post. Thank you! — Red, Mar 15 '22 at 02:10
Thanks for your comment, @AnnZen. I just edited my post accordingly. — Sheldon, Mar 15 '22 at 18:41

score 2 · Answer 1 · answered Mar 16 '22 at 04:08

The Concept

As you have probably heard, pytesseract is not good at detecting text of different sizes on the same line as one piece of text. In your case, you want to detect the 1227.938, where the 1227 is much larger than the .938.

One way to go about solving this is to have the program estimate where the .938 is, and enlarge that part of the image. After that, pytesseract will have no problem in returning the text.

The Code

import cv2
import numpy as np
import pytesseract

def process(img):
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    _, thresh = cv2.threshold(img_gray, 200, 255, cv2.THRESH_BINARY)
    img_canny = cv2.Canny(thresh, 100, 100)
    kernel = np.ones((3, 3))
    img_dilate = cv2.dilate(img_canny, kernel, iterations=2)
    return cv2.erode(img_dilate, kernel, iterations=2)

img = cv2.imread("image.png")
img_copy = img.copy()
hh = 50

contours, _ = cv2.findContours(process(img), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
for cnt in contours:
    if 20 * hh < cv2.contourArea(cnt) < 30 * hh:
        x, y, w, h = cv2.boundingRect(cnt)
        ww = int(hh / h * w)
        src_seg = img[y: y + h, x: x + w]
        dst_seg = img_copy[y: y + hh, x: x + ww]
        h_seg, w_seg = dst_seg.shape[:2]
        dst_seg[:] = cv2.resize(src_seg, (ww, hh))[:h_seg, :w_seg]

gray = cv2.cvtColor(img_copy, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 180, 255, cv2.THRESH_BINARY)
results = pytesseract.image_to_data(thresh)

for b in map(str.split, results.splitlines()[1:]):
    if len(b) == 12:
        x, y, w, h = map(int, b[6: 10])
        cv2.putText(img, b[11], (x, y + h + 15), cv2.FONT_HERSHEY_COMPLEX, 0.6, 0)

cv2.imshow("Result", img)
cv2.waitKey(0)

The Output

Here is the input image:

And here is the output image:

As you have said in your post, the only part you need the the decimal 1227.938. If you want to filter out the rest of the detected text, you can try tweaking some parameters. For example, replacing the 180 from _, thresh = cv2.threshold(gray, 180, 255, cv2.THRESH_BINARY) with 230 will result in the output image:

The Explanation

Import the necessary libraries:

import cv2
import numpy as np
import pytesseract

Define a function, process(), that will take in an image array, and return a binary image array that is the processed version of the image that will allow proper contour detection:

def process(img):
    img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    _, thresh = cv2.threshold(img_gray, 200, 255, cv2.THRESH_BINARY)
    img_canny = cv2.Canny(thresh, 100, 100)
    kernel = np.ones((3, 3))
    img_dilate = cv2.dilate(img_canny, kernel, iterations=2)
    return cv2.erode(img_dilate, kernel, iterations=2)

I'm sure that you don't have to do this, but due to a problem in my environment, I have to add pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe' before I can call the pytesseract.image_to_data() method, or it throws an error:

pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Read in the original image, make a copy of it, and define the rough height of the large part of the decimal:

img = cv2.imread("image.png")
img_copy = img.copy()
hh = 50

Detect the contours of the processed version of the image, and add a filter that roughly filters out the contours so that the small text remains:

contours, _ = cv2.findContours(process(img), cv2.RETR_TREE, cv2.CHAIN_APPROX_NONE)
for cnt in contours:
    if 20 * hh < cv2.contourArea(cnt) < 30 * hh:

Define the bounding box of each contour that didn't get filtered out, and use the properties to enlarge those parts of the image to the height defined for the large text (making sure to also scale the width accordingly):

        x, y, w, h = cv2.boundingRect(cnt)
        ww = int(hh / h * w)
        src_seg = img[y: y + h, x: x + w]
        dst_seg = img_copy[y: y + hh, x: x + ww]
        h_seg, w_seg = dst_seg.shape[:2]
        dst_seg[:] = cv2.resize(src_seg, (ww, hh))[:h_seg, :w_seg]

Finally, we can use the pytesseract.image_to_data() method to detect the text. Of course, we'll need to threshold the image again:

gray = cv2.cvtColor(img_copy, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(gray, 180, 255, cv2.THRESH_BINARY)
results = pytesseract.image_to_data(thresh)

for b in map(str.split, results.splitlines()[1:]):
    if len(b) == 12:
        x, y, w, h = map(int, b[6: 10])
        cv2.putText(img, b[11], (x, y + h + 15), cv2.FONT_HERSHEY_COMPLEX, 0.6, 0)

cv2.imshow("Result", img)
cv2.waitKey(0)

Thanks a lot for your answer Ann Zen! I did put your `process` function to good use. Please check **EDIT #2** for more details. — Sheldon, Mar 17 '22 at 02:55
@Sheldon I am so happy! It's so amazing how teamwork does the trick. Just yesterday, we pulled together and solved [this](https://stackoverflow.com/q/71385967/13552470) problem. Good luck on your project :) — Red, Mar 17 '22 at 03:05

Esraa Abdelmaksoud · Answer 2 · 2022-03-15T02:54:14.373

1

I have been working with Tesseract for quite some time, so let me clarify something for you. Tesseract is extremely helpful if you're trying to recognize text in documents more than any other computer vision projects. It usually needs a binarized image to get a good output. Therefore, you will always need some image pre-processing.

However, after several trials in the past with all page segmentation modes, I realized that it fails when font size differs on the same line without having a space. Sometimes PSM 6 is helpful if the difference is low, but in your condition, you may try an alternative. If you don't care about the decimals, you may try the following solution:

img = cv2.imread(r'E:\Downloads\Iwzrg.png')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_blur = cv2.GaussianBlur(gray, (3,3),0)
_,thresh = cv2.threshold(img_blur,200,255,cv2.THRESH_BINARY_INV)

# If using a fixed camera
new_img = thresh[0:100, 80:320]

text = pytesseract.image_to_string(new_img, lang='eng', config='--psm 6 --oem 3 -c tessedit_char_whitelist=0123456789')

OUTPUT: 1227

edited Mar 15 '22 at 02:54

answered Mar 15 '22 at 00:28

Esraa Abdelmaksoud

1,307
12
25

Thanks for your reply @Esraa! I was suspecting that the font difference between the integral and fractional parts would be problematic. If I understand your solution correctly, you suggest zooming in on the number and then going for `psm = 6`, *i.e.* assuming a single uniform block of text. This looks promising: I will try this approach with other images. – Sheldon Mar 15 '22 at 19:02
In any case, please let me know if you would recommend other tools than Tesseract to solve this OCR problem. – Sheldon Mar 15 '22 at 19:03
You're always welcome! :)For more complex problems I usually use PaddleOCR, but it also has some problems with detecting spaces and dots. They promised to fix the problem of spaces on the next release, so you may get good results soon if you try it. :) – Esraa Abdelmaksoud Mar 15 '22 at 20:05
Thanks again. Just FYI, I obtained better results by increasing the kernel size. I am not sure whether this is a good practice, but it gave decent results in this specific example. Please check **EDIT2** for more details. – Sheldon Mar 17 '22 at 03:14
That's so nice! Just try more examples and let's see what would happen. I'm just not sure whether the place of your dot changes from an image to another. – Esraa Abdelmaksoud Mar 17 '22 at 03:21

score 1 · Answer 3 · answered Mar 16 '22 at 01:05

I would like to recommend applying another image processing method.

Because I am dealing with a dark background, I first invert the image, before converting it to grayscale and thresholding it:

You applied global thresholding and couldn't achieve the desired result.

Then you can apply either adaptive-thresholding or inRange

For the given image, if we apply the inRange threshold:

To be able to recognize the image as accurately as possible we can add a border to the top of the image and resize the image (Optional)

In the OCR section, check if the detected region contains a digit

if text.isdigit():

Then display on the image:

The result is nearly the desired value. Now you can try with the other suggested methods to find the exact value.

The problem is .938 recognized as 235, maybe resizing using different values might improve the result.

Code:

from cv2 import imread, cvtColor, COLOR_BGR2HSV as HSV, inRange, getStructuringElement, resize
from cv2 import imshow, waitKey, MORPH_RECT, dilate, bitwise_and, rectangle, putText
from cv2 import copyMakeBorder as addBorder, BORDER_CONSTANT as CONSTANT, FONT_HERSHEY_SIMPLEX
from numpy import array
from pytesseract import image_to_data, Output

bgr = imread("Iwzrg.png")
resized = resize(bgr, (800, 600), fx=0.75, fy=0.75)
bordered = addBorder(resized, 200, 0, 0, 0, CONSTANT, value=0)
hsv = cvtColor(bordered, HSV)
mask = inRange(hsv, array([0, 0, 250]), array([179, 255, 255]))
kernel = getStructuringElement(MORPH_RECT, (50, 30))
dilated = dilate(mask, kernel, iterations=1)
thresh = 255 - bitwise_and(dilated, mask)

data = image_to_data(thresh, output_type=Output.DICT)

for i in range(0, len(data["text"])):
    x = data["left"][i]
    y = data["top"][i]
    w = data["width"][i]
    h = data["height"][i]
    text = data["text"][i]

    if text.isdigit():
        print("Text: {}".format(text))
        print("")
        text = "".join([c if ord(c) < 128 else "" for c in text]).strip()
        rectangle(thresh, (x, y), (x + w, y + h), (0, 255, 0), 2)
        putText(thresh, text, (x, y - 10), FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)
        imshow("", thresh)
        waitKey(0)

Thanks a lot for your reply! I will try using adaptive thresholding in my future implementations. — Sheldon, Mar 17 '22 at 02:53
Also, thanks for suggesting the use of `isdigit`: I am sure that this function will come in handy. So far, it has given mixed results because I am dealing with unicode characters. — Sheldon, Mar 17 '22 at 03:08

Adjusting pytesseract parameters

3 Answers3

The Concept

The Code

The Output

The Explanation