5

I am able to get word level confidence score using tesseract 4.0 through the command line. Interested to know if there is a way to get the character confidence too.

For word level confidence used the below command:

tesseract [Image name] outputbase --oem 1 -l eng --psm 8 tsv
  • ,I also have same question like above ,if you found answer .please share to me.... – udya Dec 03 '18 at 07:10
  • Possible duplicate of [Does Tesseract's hOCR output really contain bounding boxes and confidence levels for each character?](https://stackoverflow.com/questions/15829148/does-tesseracts-hocr-output-really-contain-bounding-boxes-and-confidence-levels) – jtlz2 Sep 04 '19 at 06:59

1 Answers1

5

Set hocr_char_boxes to 1 in your config file. Or, at the command line, your updated command would be:

tesseract [Image name] outputbase --oem 1 -l eng --psm 8 -c hocr_char_boxes=1 hocr

Note the hocr output option and look in that file for ..._wconf, e.g.

 <span class='ocrx_word' id='word_1_1' title='bbox 127 344 4618 6915; x_wconf 1'>

Let me know if this works for you, otherwise I'll just delete the answer.

Source: https://github.com/tesseract-ocr/tesseract/issues/1465#issuecomment-513139976

jtlz2
  • 7,700
  • 9
  • 64
  • 114