Improving Tesseract OCR accuracy on screenshot

Question

The tesseract OCR on screenshots gives rather erratic results. Only some of the text seems to be recognized correctly even though the image is completely black with white text over it. Even after I resize the image to 300dpi the accuracy remains low and most of the text is gibberish.

I read the similar question on StackOverflow at: Best way to recognize characters in screenshot?

As mentioned, the writer of the question was able to get nearly 100% accuracy by training the tesseract engine with his font.

The font in my image is Arial. How can I still improve the accuracy???

Here is a sample of the kind of Images I have: Image Sample

score 1 · Answer 1 · answered Jun 19 '19 at 12:03

1

You can play around with the configuration of the OCR by changing the --psm and --oem values

try: --psm 5 --oem 2

you can also look at the following link for further details here

answered Jun 19 '19 at 12:03

sameer maurya

111
1
4

score 0 · Answer 2 · answered Mar 18 '21 at 18:08

The issue is old, but comes first in google search, so I thought, I'd answer. I had a very similar issue, thought I'd go crazy, but then by chance found this page: https://tesseract-ocr.github.io/tessdoc/ImproveQuality

There it says: Inverting images While tesseract version 3.05 (and older) handle inverted image (dark background and light text) without problem, for 4.x version use dark text on light background.

I negated the image with ImageMagick and there we go: 100% match!

Improving Tesseract OCR accuracy on screenshot

2 Answers2