3

TL;DR It appears that tesseract cannot recognize images consisting of a single digit. Is there a workaround/reason for this?

I am using (the digits only version of) tesseract to automate inputting invoices to the system. However, I noticed that tesseract seems to be unable to recognize single digit numbers such as the following:

The raw scan after crop is:

enter image description here

After I did some image enhancing:

enter image description here

It works fine if it has at least two digits:

enter image description here enter image description here

I've tested on a couple of other figures:

Not working: enter image description here, enter image description here, enter image description here

Working: enter image description here, enter image description here, enter image description here

If it helps, for my purpose all inputs to tesseract has been cropped and rotated like above. I am using pyocr as a bridge between my project and tesseract.

Irvan
  • 439
  • 4
  • 19

3 Answers3

4

Here's how you can configure pyocr to recognize individual digits:

from PIL import Image
import sys
import pyocr
import pyocr.builders

tools = pyocr.get_available_tools()
if len(tools) == 0:
    print("No OCR tool found")
    sys.exit(1)
tool = tools[0]

im = Image.open('digit.png')
builder = pyocr.builders.DigitBuilder()

# Set Page Segmentation mode to Single Char :
builder.tesseract_layout = 10 # If tool = tesseract
builder.tesseract_flags = ['-psm', '10'] # If tool = libtesseract

result = tool.image_to_string(im, lang="eng", builder=builder)
vSomers
  • 426
  • 3
  • 13
2

Individual digits are handled the same way as other characters, so changing the page segmentation mode should help to pick up the digits correctly.

See also: Tesseract does not recognize single characters

Community
  • 1
  • 1
rmtheis
  • 5,992
  • 12
  • 61
  • 78
  • May I ask you to have a look at a Tesseract related question here : https://stackoverflow.com/questions/66946835/improving-accuracy-in-python-tesseract-ocr ? – Istiaque Ahmed Apr 05 '21 at 08:50
0

Set PageSegMode to PSM_SINGLE_CHAR

Fermat's Little Student
  • 5,549
  • 7
  • 49
  • 70