0

I have aplied pytesseract in Three similar images of the digit "2". Only in the last one, pytesseract reconize correctly the digit. The three images have diferent dimensions and if i change the dimension of the images in the right way, pytesseract correctly reconize them. But i dont understand how a powerful ocr like tesseract is not working well in a so easy and clear image.

first image, fail in recognize

second image, also fail

third image, sucessful

im using python 3.7 with anaconda, tesseract v4.0.0.20181030 leptonica-1.76.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0

  • how the best way could i train my tesseract? – Priscila Timbó Jan 28 '19 at 04:20
  • You don't need to train tesseract. Just invert colors on your image and add white border if it still doesn't work properly. Tesseract tries to guess font size by black pixels on the image and since your image is white text on black background - it fails to do so correctly. – Dmitrii Z. Jan 28 '19 at 17:28
  • Dmitri Z., now is working perfectlly. I apreciate to know how tesseract works. Thank you very much – Priscila Timbó Jan 29 '19 at 00:24

0 Answers0