tesseract not recognize one number image

Question

i am using tesseract with python. It recognizes almost all of my images with 2 or more numbers or characteres. But tesseract can't recognizes image with only one number. I tried to use the command line, and it's giving me "empty page" as response.

I don't want to train tesseract with "only digits" because i am recognizing characters too.

What is the problem?

Below the image that its not recognized by tesseract.

Code:

 #getPng(pathImg, '3') -> creates the path to the figure.
 pytesseract.image_to_string( Image.open(getPng(pathImg, '3'))

technically this isn't a python question as the python aspect of it seems to be working fine. You can try binarizing the image, making the text smaller. If the text is too big sometimes Tesseract can't identify the symbols. — Srini, Mar 26 '18 at 20:37
If would help, if you can show us what you have in terms of code so far. https://stackoverflow.com/help/mcve — hikerjobs, Mar 26 '18 at 20:37
@SrinivasSuresh i took off the "python" tag. In the case i tried to enlarge the image to see if improve something, i will try to make it smaller, as you suggest now. — Luiza Rodrigues, Mar 26 '18 at 20:46
are you sure you just reduced the area of the image occupied by the symbol and didn;t just shrink the entire image? — Srini, Mar 26 '18 at 20:50
Does this answer your question? [Tesseract does not recognize single characters](https://stackoverflow.com/questions/9632044/tesseract-does-not-recognize-single-characters) — user202729, Jan 15 '21 at 05:50

score 10 · Answer 1 · answered Mar 27 '18 at 11:58

10

If you add the parameter --psm 13 it should works, because it will consider it as a raw text line, without searching for pages and paragraphs.

So try:

pytesseract.image_to_string(PATH, config="--psm 13")

answered Mar 27 '18 at 11:58

sinecode

743
1
7
17

Hello, in the end, i made my own image (with a zero) and concatenate the 2 images. Everytime tesseract doesn't recognizes the number solo, i put the image with the zero. Now that i have two digits, it works. – Luiza Rodrigues Apr 18 '18 at 13:20
this helped some for me, but it thinks a 1 is a 4 :( – John Kurtz Oct 18 '18 at 19:51

score 2 · Answer 2 · answered Feb 21 '19 at 04:16

2

Try converting image into gray-scale and then to binary image, then most probably it will read. If not duplicate the image , then you have 2 letters to read. So simply you can extract single letter

answered Feb 21 '19 at 04:16

Ashane.E

109
10

Also, try updating the tesseract version. tesseract 4.0.0-beta.1 and tesseract 4.0.0 works better – Ashane.E Sep 25 '19 at 14:49

score 0 · Answer 3 · answered Apr 06 '20 at 00:52

Based on ceccoemi answer you could try other page segmentation modes (--psm flag).

For this special case I suggest using --psm 7 (single text line) or --psm 10 (single character):

psm7 = pytesseract.image_to_string(Image.open(getPng(pathImg, '3'), config='--psm 7')
psm10 = pytesseract.image_to_string(Image.open(getPng(pathImg, '3'), config='--psm 10')

More information about these modes can be found in the tesseract wiki.

score 0 · Answer 4 · answered Apr 13 '20 at 02:48

0

You can use -l osd for single digit like this.

tesseract VYO0C.png stdout -l osd --oem 3 --psm 6
2

answered Apr 13 '20 at 02:48

us2018

603
6
11

tesseract not recognize one number image

4 Answers4