2

I'm having trouble preprocessing this CAPTCHA image to be read in by an OCR like Tesseract.

I am having specific trouble with the vertical line intersecting the u.

Tesseract is having no problem at all reading k3cvx. The kerning between ju5k along with the vertical line contribute to the inaccuracy of Tesseract.

I've tried diluting and eroding the image but it reduces the quality of the letters to a point where it is unreadable by Tesseract.

Original Image - no further preprocessing:

no further preprocessing

Second Image - After trying to remove vertical lines using the solution found here: After trying to remove vertical lines using the solution

Third Image - Median Filter then erosion applied to the second image:

Median Filter then erosion applied to the second image

Any help would be appreciated.

DreadedSlug
  • 497
  • 1
  • 7
  • 21
  • 1
    I suggest that this activity is unethical. Attempting to subvert the CAPTCHA protection shows a lack of respect for the owner of the server, whether they are doing it to protect their bandwidth or their business – fmw42 Feb 20 '21 at 05:27
  • 1
    @fmw42 This is for personal knowledge rather than "subverting" a caption from a server. There are MANY related questions on this site alone. Not sure why you decided to label mine unethical. – DreadedSlug Feb 20 '21 at 05:35
  • 2
    @fmw42 CAPTCHA as an idea is a technological arms race. Without people trying to crack / subvert captchas, innovation wouldn't happen. Text-based CAPTCHA's are basically already solved with machine learning and is a contributing factor to why we have alternatives like reCAPTCHA. This honestly looks like a personal exercise in computer vision. What he uses it for is up to him. There are hundreds of video tutorials using opencv to solve text captchas as an educational exercise. I don't see a point in your comment. – Gunner Stone Feb 20 '21 at 05:42

0 Answers0