7

I'm using python-tesseract wrapper to OCR an image. However, for certain images I'm getting different results than what the tesseract command from command line fetches. On command line I do tesseract myimg.png myimg && more myimg.txt

However, the result from python tesseract wrapper are different.

I suspect it is because maybe liblept is not found from the wrapper since this line results False:

import tesseract
import ctypes
import os
print "HAVE_LIBLEPT=",tesseract.isLibLept()

and sometimes I get these errors while using the wrapper but i don't from command line tesseract:

Error in pixReduceRankBinary2: hs must be at least 2
Error in pixDilateBrick: pixs not defined
Error in pixExpandReplicate: pixs not defined
Error in pixAnd: pixs1 not defined
Error in pixDilateBrick: pixs not defined
Error in pixExpandReplicate: pixs not defined
Error in pixAnd: pixs2 not defined
Telephone Company Suspicious Activity

Does anyone know what could be cause of the mis-match? And How can I tell the wrapper to find liblept since the command line tesseract is working fine, I assume that it is finding tesseract properly

$ tesseract --version
tesseract 3.02.02
 leptonica-1.69
  libjpeg 8d : libpng 1.5.14 : libtiff 4.0.3 : zlib 1.2.5
birdy
  • 9,286
  • 24
  • 107
  • 171
  • 2
    Did you find an answer about this? I am actually comparing the result of tesseract with -l eng --oem 3 --psm 11 using CLI to pytesseract.image_to_data(im, config='--psm 11 --oem 3 -l eng') and I can easily see that pytesseract is giving me different text as well as less relevant text. Doesn't pytesseract inherently use tesseract 4.0 ? – SKR Sep 24 '18 at 04:10

0 Answers0