How to identify single digits from image?

Question

I am trying to classify an image based on its content. For example, I have got loads of images as below, that will contain some content – in this case numeric values. I had tried OpenCV and Pytesseract OCR solution as proposed here: https://stackoverflow.com/a/60161328/7250310

However, this solution doesn't work on my images, and the content isn't detected. Below are my sample images:

Image 1:

Image 2:

Image 3:

Image 4:

Do you have any other ideas to achieve this? Basically Image 1 should give output as 1, and so on.

score 2 · Accepted Answer · answered May 31 '21 at 08:52

This simple approach works at least for the four presented images:

import cv2
import pytesseract

images = ['4sXGS.jpg', 'Nizki.jpg', 'T0EM8.jpg', 'g2fY7.jpg']

for img in images:

    img = cv2.imread(img, cv2.IMREAD_GRAYSCALE)
    img = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV)[1]

    text = pytesseract.image_to_string(img, config='--psm 10')
    text = text.replace('\n', '').replace('\f', '')
    print(text)

Output:

The single steps are:

Read the image as grayscale.
Inverse binary threshold the image using Otsu's method.
Run pytesseract using the -psm 10 option (single character). Maybe also add the described whitelisting for identifying digits only.

Caveat: I use a special version of Tesseract from the Mannheim University Library.

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.1
OpenCV:        4.5.2
pytesseract:   5.0.0-alpha.20201127
----------------------------------------

thank you for sharing. Is there a mac version fo the special version I can install from? I ran same code with normal tesseract and it doesnt work for digit 1 image. — Fazal, Jun 01 '21 at 04:57
@Fazal Unfortunately, I can't give any advice on that. The "special" mostly refers to the fact, that they built their own Windows installer. The underlying source code should be the common (or current) Tesseract 5.0.0.0-alpha. Maybe search for mac OC build instructions for that version!? What's your version of Tesseract? — HansHirse, Jun 01 '21 at 06:47
4.1.1 is the version I have got installed. I tried to find mac oc build couldnt find it. Maybe its too complicated for me. — Fazal, Jun 01 '21 at 16:36

How to identify single digits from image?

1 Answers1