4

I'm trying to use pytesseract to recognize two numbers from an image:

enter image description here

  • I have tried --psm 6 up to 10
  • I have tried -c tessedit_char_whitelist=0123456789'

None of the above returns 49 number. Closest I got is returned 4 without 9

Do you have any tips about how to make tesseract recognize it ?

Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
Povilas
  • 627
  • 2
  • 6
  • 19

3 Answers3

8

Try --psm 13 --oem 3 (oem = 1 or 2 should do also)

import pytesseract
from PIL import Image
import requests
import io

response = requests.get('https://i.stack.imgur.com/oAAXR.png')
text = pytesseract.image_to_string(Image.open(io.BytesIO(response.content)), lang='eng',
                    config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789')

print(text)

yields 49 as you expect on my machine.

I get the same result by downloading the image locally and firing

tesseract oAAXR.png output --oem 3 --psm 13 -l eng

For reference my tesseract --version gives tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 2.0.1) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 Found AVX2 Found AVX Found SSE.

Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
  • Where is --psm 13 documented? I only see 1-10 here: [TESSERACT(1) Manual Page](https://github.com/tesseract-ocr/tesseract/blob/master/doc/tesseract.1.asc). – user3169 Jan 07 '19 at 06:55
  • Aw, looks like their documentation may be inconsistent or refers to different versions there, check https://github.com/tesseract-ocr/tesseract/wiki/Command-Line-Usage for psm > 10. – Davide Fiocco Jan 07 '19 at 10:14
  • Thanks for the answer, but your code gives me "ay" string instead of 49. tesseract versions: tesseract 4.0.0 leptonica-1.77.0 libgif 5.1.4 : libjpeg 9c : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.1 : libopenjp2 2.3.0 Found AVX2 Found AVX Found SSE I'm also using MacOS (Mojave). Maybe that has something to do with it – Povilas Jan 08 '19 at 15:10
  • I have edited the answer with my config, not sure what could be going wrong :( – Davide Fiocco Jan 08 '19 at 21:10
  • Yeah I can see that only difference is: your libjpeg is 8d and my libjpeg 9c. Everything else is same. – Povilas Jan 10 '19 at 12:22
  • This is the fastest ODR I've found after a lot of research. Most number recognition solutions are outdated, but for 2021 this works like a charm. – Vichoko Aug 03 '21 at 04:17
1

Have you tried different --oem ? I would also try to use a --psm higher than 10.

Davide Fiocco
  • 5,350
  • 5
  • 35
  • 72
QuarKUS7
  • 143
  • 6
1

For me the following command just returns 4:

tesseract oAAXR.png out --dpi 300 --psm 11 --oem 1 -c tessedit_char_whitelist=0123456789

Using:

tesseract 4.1.1-rc2-17-g6343
 leptonica-1.76.0
  libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.36 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0
 Found AVX2
 Found AVX
 Found FMA
 Found SSE
 Found libarchive 3.3.3 zlib/1.2.11 liblzma/5.2.4 bz2lib/1.0.6 liblz4/1.8.3 libzstd/1.3.8
Chris
  • 1,140
  • 15
  • 30