Pytesseract skips "1" but not "10" in the same file

Question

I am working with pytesseract and openCV to try to recognize a table of numbers. I have been working heavyly on the image to resize, resample and treshold its colors to make it easier for pytesseract to read. Below is the image I managed to generate.

My problem is that everytime a single "1" comes up in a row, pytesseract isn't able to recognize it...

This is the image I am trying to read (once I have applied all the mentionned processings) :

This is the relevant part of the code :

from PIL import Image
import pytesseract

img = cv2.imread('test.jpg', 0)
data = pytesseract.image_to_string(img)

And this is the output:

10

499

I also tried with --psm 10 and --psm 13 but the outputs are just gibberish like the following :

=
:x

score 1 · Accepted Answer · answered Feb 02 '21 at 20:18

Apply inverse binary threshold:

Set page-segmentation mode to 6

1
10
499

Code:

import cv2
from pytesseract import image_to_string

image = cv2.imread('uHLww.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV)[1]
text = image_to_string(thresh, config="--psm 6")
print(text)

Second solution:

You don't even have to apply thresholding, setting psm to 6 will give you the result.

import cv2
from pytesseract import image_to_string

print(image_to_string(cv2.imread('uHLww.png'), config="--psm 6"))

Yeah psm 6 is doing good with this one, thats weird as I have already tested it before... Thanks — Leogout, Feb 03 '21 at 08:27

Pytesseract skips "1" but not "10" in the same file

1 Answers1