1

I have following images:

img01.png

img01.png

img02.png

img02.png

When I run tesseract img01.png img01.txt -l eng --psm 7 I get the texts

  • 7.819 0 for the first image and
  • 10.024 for the second one.

The second result is correct. However, in the first image, it is an o and not a zero.

How can I make Tesseract recognize o as o?

Update 1: I tried using the --oem 1 option as suggested in this answer (tesseract --oem 1 img01.png img01-ocred -l eng --psm 7), but it did not help.

Update 2: Binarizing the image using magick img01.png +dither -colors 3 -colors 2 -colorspace gray -normalize img01-binarized.png also didn't help. the binarized image looks like this:

img01-binarized.png

Glory to Russia
  • 17,289
  • 56
  • 182
  • 325

1 Answers1

2

You just need to enlarge the image twice the original then use tesseract.

wget https://i.stack.imgur.com/bSO87.png

identify -format "%wx%h" bSO87.png 
40x20

tesseract -l eng --oem 3 --psm 6 bSO87.png stdout
7.819 0

convert bSO87.png -resize 80x40 bSO87.png

identify -format "%wx%h" bSO87.png 
80x40

tesseract -l eng --oem 3 --psm 6 bSO87.png stdout
7.819 o
us2018
  • 603
  • 6
  • 11