1

I am using Pytesseract for OCR. But it looks like there is no option in the documentation to extract the confidence of ever character. I already have the Confidence of word but I want to know at which character the confidence is getting low.

So after research I came to know there is a function tesserractExtractResult() in the tesseract API which can give confidence of characters.

How can I use this function in Python?

Martin Evans
  • 45,791
  • 17
  • 81
  • 97
user3809411
  • 326
  • 1
  • 3
  • 19
  • Similar issue [here](https://stackoverflow.com/questions/48162645/how-to-get-character-wise-confidence-in-tesseract-using-command-line?rq=1) - also no answers. It seems to require source code modification as suggested [here](https://stackoverflow.com/questions/17393555/character-confidence-for-tesseract-3-02-using-config-file) for an older version. – FObersteiner Aug 26 '19 at 16:30
  • I added an answer for this (but tesseract not pytesseract) - see https://stackoverflow.com/questions/48162645/how-to-get-character-wise-confidence-in-tesseract-using-command-line?rq=1 – jtlz2 Sep 03 '19 at 07:26
  • Would you accept a tesseract answer or must it be pytesseract? – jtlz2 Sep 03 '19 at 08:00

1 Answers1

1

Pytesseract calls Tesseract in the background as if launched in a terminal (here in the source code), so you have at your disposition only what the shell command can do - and as far I know, you can't get character confidence.

I think that pyocr should be able to do so, but it is needed to add the function call (maybe in tesseract_raw.py? ).

Also, more as a note: it seems that python-tesseract and pytess have at least some line in code referring to tesseractExtractResult, but last commits were respectively in 2015 and 2012.

AleG
  • 108
  • 9
  • Thanks for the Help. I was able to get it using tesserocr, But it seems all the characters were having a confidence of 98(approx), while the word confidene was 32. So not much of help here – user3809411 Aug 29 '19 at 08:05