Solve captcha using tessarct version 3.04

Question

How can I solve a captcha using tesseract?

I have preprocessed the image using Imagemagick, but so far I have failed to solve this.

Below you can find the image that I am using:

I have used the following command as Cyrillic letters included:

tesseract output.png test -l bul+eng

score 0 · Answer 1 · answered Jun 13 '18 at 14:25

It’s far from a secret that Tesseract is not an all-in-one OCR tool that recognizes all sort of texts and figures. In fact, this couldn’t have been further from the truth. Once you work with real documents that greatly varies in brightness, clarity, and perspective. In your case, it's relatively simple as the characters do not overlap and the background is clearly distinguishable from the background. So, this is good news!

To start with, I'd start off by using Tesseract library, rather than relying on its functionalities on terminal. I mean it's okay, but it definitely lacks the flexibility since it limits you with a few image operations on you could do on terminal. Although ImageMagick provides an extensive tool for image processing, from my experiences, you're more likely to get better results by using such libraries, i.e. ImageMagick, or OpenCV, in your code.

Just to give you a quick start on tesseract and avoid repeating myself, I'll link one of my previous answers to a similar question. I don't know how familiar you're with python, but I hope you'll be able to follow.

Solve captcha using tessarct version 3.04

1 Answers1