Improving image pre-processing for tesseract (video game screenshot)

Question

I am trying to read text for prices in a video game and am experiencing difficulty in pre-processing the image.

The rest of my code is "complete", as in after the text is extracted I am formatting it and outputting into CSV for later use.

This is what I have come up with so far for the following images, and would like input on other thresholds or pre-processing tools that will make the OCR more accurate.

Raw Image Screenshot

After gamma, denoise on left - binary threshold on right

The text detected

As you can see, it is very close but not perfect. I would like to make it more accurate as I will be processing many frames eventually.

Here is my current code:

import cv2
import pytesseract
import pandas as pd
import numpy as np

# Tells pytesseract where the tesseract environment is installed on local computer
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

img = cv2.imread("./image_frames/frame0.png")

# gamma to darken text to be same opacity?
def adjust_gamma(crop_img, gamma=1.0):
    # build a lookup table mapping the pixel values [0, 255] to
    # their adjusted gamma values
    invGamma = 1.0 / gamma
    table = np.array([((i / 255.0) ** invGamma) * 255
        for i in np.arange(0, 256)]).astype("uint8")
    # apply gamma correction using the lookup table
    return cv2.LUT(crop_img, table)

adjusted = adjust_gamma(crop_img, gamma=0.15)

# grayscale the image
gray = cv2.cvtColor(adjusted, cv2.COLOR_BGR2GRAY)
# denoising image
dst = cv2.fastNlMeansDenoising(gray, None, 10, 10, 10)


# binary threshold
thresh = cv2.threshold(gray, 35, 255, cv2.THRESH_BINARY_INV)[1]


# OCR configurations (3 is default)
config = "--psm 3"

# Just show the image
cv2.imshow("before", gray)
cv2.imshow("before", dst)
cv2.imshow("thresh", thresh)
cv2.waitKey(0)

# Reads text from the image and prints to console
text = pytesseract.image_to_string(thresh, config=config)
# remove double lines
text = text.replace('\n\n','\n')
# remove unicode character
text = text.replace('', '')
print(text)

Any help is appreciated as I am very new to this!

I’m sure @Ahx answer provides plenty of useful information but a little concerned they don’t provide any links to guidance about improving tesseract results - you should probably start here https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html - note this result is the top result when goggling for _tesseract improve recognition_ which could quite easily have been a search you did? — DisappointedByUnaccountableMod, Jan 09 '21 at 00:54
Yeah I understand, I've been looking up articles on how to improve pre processing and played around with different thresholds etc but never seemed to be able to improve past this. Potentially didn't look up improving for tesseract specifically... Thank you for the guidance as well! — Lumelity, Jan 09 '21 at 19:01
I've included the useful links to my answer. You can look at under title "Links" — Ahmet, Jan 29 '21 at 19:12

Ahmet · Answer 1 · 2021-01-29T19:12:09.013

Step#1: Scale the image

Step#2: Apply adaptive-threshold

Step#3: Set page-segmentation-mode (psm) to 6 (Assume a single uniform block of text.)

1 Scaling the image:

The reason is to see the image clearly, since the original image is really small.

img = cv2.imread("udQw1.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)

2 Apply adaptive-threshold

Generally threshold is applied, but in your image, applying threshold has no effect to the result.
For different images you may need to set different C and block values.
For instance for the 1st image:

gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 15, 22)

Result:
For instance for the 2nd image:

gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 51, 4)

Result:

3 Set psm to 6 which assumes the image as a single uniform block of text.

txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)

Result for the 1st image:

Dragon Claymore
1,388,888,888 mesos.
Maple Pyrope Spear
288,888,888 mesos.
Element Pierce
488,888,888 mesos.
Purple Adventurer Cape
97,777,777 mesos.

Result for the 2nd image:

Ring of Alchemist
749,999,995 mesos.
Dragon Slash Claw
499,999,995 mesos.
"Stormcaster Gloves
149,999,995 mesos.
Elemental Wand 6
749,999,995 mesos.

Big Money Chalr

1 tor 249,999,985 mesos.|

Code for the 1st image:

import pytesseract
import cv2

img = cv2.imread("udQw1.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 15, 22)
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)

Code for the 2nd image:

import pytesseract
import cv2

img = cv2.imread("7Y2yx.png")
img = cv2.resize(img, None, fx=3, fy=3, interpolation=cv2.INTER_CUBIC)
gry = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thr = cv2.adaptiveThreshold(gry, 255, cv2.ADAPTIVE_THRESH_MEAN_C,
                            cv2.THRESH_BINARY_INV, 51, 4)
txt = pytesseract.image_to_string(thr, config="--psm 6")
print(txt)

Links

Improving image pre-processing for tesseract (video game screenshot)

1 Answers1