0

I've been working with pytesseract the past days, and I've noticed that the library is quite bad at identifying numbers. I do not know, if I am doing something wrong, but I keep getting as an output.

class Image_Recognition():
    def digit_identification(self):
        # save normal screenshot
        screen = ImageGrab.grab(bbox=(706,226,1200,726))
        screen.save(r'tmp\tmp.png')

        # read the image file
        img = cv2.imread(r'tmp\tmp.png', 2)
        
        # convert to binary image
        [ret, bw_img] = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY)

        # use OCR library to identify numbers in screenshot
        text = pytesseract.image_to_string(bw_img)
        print(text)

INPUT:

Input 1

Input 2

(Converted to a binary image in order to make numbers more intelligible.)

OUTPUT:

Tell me if there is something off, or just suggest other approaches for handling text recognition.

HansHirse
  • 18,010
  • 10
  • 38
  • 67
Ekhi Arzac
  • 89
  • 1
  • 10

1 Answers1

2

First of all, please read the article Improving the quality of the output, especially the section regarding the page segmentation method. Also, you can limit the characters to be found to digits 0-9.

You have a tiny image, which makes extraction of all numbers at once quite challenging, especially for the mixture of bright text on dark background and vice versa. But, you can quite easily crop all the single tiles, and extract the numbers one by one. So, no distinction between these two types of tiles needs to be made.

Also, you know, that numbers must be multiples of two (I guess, most people will know 2048). So, if no such a number could be found, try upscaling the cropped tile, and repeat. (Eventually, give up after a few times.)

That'd be my full code:

import cv2
import math
import pytesseract


# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def log2(x):
    return math.log10(x) / math.log10(2)


# https://www.geeksforgeeks.org/python-program-to-find-whether-a-no-is-power-of-two/
def is_power_of_2(n):
    return math.ceil(log2(n)) == math.floor(log2(n))


# Load image, get dimensions of a single tile
img = cv2.imread('T72q4s.png')
h, w = [x // 4 for x in img.shape[:2]]

# Initialize result array (too lazy to import NumPy for that...)
a = cv2.resize(cv2.cvtColor(img, cv2.COLOR_BGR2GRAY), (4, 4)).astype(int)

# https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#page-segmentation-method
# https://stackoverflow.com/q/4944830/11089932
config = '--psm 6 -c tessedit_char_whitelist=0123456789'

# Iterate tiles, and extract texts
for i in range(4):
    for j in range(4):
        
        # Crop tile
        x1 = i * w
        x2 = (i + 1) * w
        y1 = j * h
        y2 = (j + 1) * h
        roi = img[y1:y2, x1:x2]

        # If no proper power of 2 is found, upscale image and repeat
        while True:
            text = pytesseract.image_to_string(roi, config=config)
            text = text.replace('\n', '').replace('\f', '')
            if (text == '') or (not is_power_of_2(int(text))):
                roi = cv2.resize(roi, (0, 0), fx=2, fy=2)
                if roi.shape[0] > 1000:
                    a[j, i] = -1
                    break
            else:
                a[j, i] = int(text)
                break

print(a)

For the given image, I get the following output:

[[ 8 16  4  2]
 [ 2  8 32  8]
 [ 2  4 16  4]
 [ 4  2  4  2]]

For another similar image

Another image

I get:

[[ 4 -1 -1 -1]
 [ 2  2 -1 -1]
 [-1 -1 -1 -1]
 [ 2 -1 -1 -1]]
----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.3
OpenCV:        4.5.3
pytesseract:   5.0.0-alpha.20201127
----------------------------------------
HansHirse
  • 18,010
  • 10
  • 38
  • 67