4

I am doing penetration testing on my friend's website,
and I've spotted a captcha on the site which appeared to me an easy task to solve.
enter image description here

After applying a a Gaussian blur, and then simple thresholds, I have ended up with the following:
enter image description here
After feeding this to tesseract-ocr, I got the following output:
CLBTJE

So OCR failed to recognize the last two characters in the text.
I would imagine the issue would be primarily that tesseract can't segment the 'T' and the 'X'.

My main question then becomes, is it possible to force tesseract to do the segmenting, or do I have to implement such myself?

Here is the C# code I'm using to perform OCR:

var image = new Bitmap(pictureBox1.Image);
var ocr = new Tesseract();
ocr.SetVariable("tessedit_char_whitelist", "QWERTYUIOPASDFGHJKLZXCVBNM" + "QWERTYUIOPASDFGHJKLZXCVBNM".ToLower()); 
ocr.Init(@"tessdata", "eng", false);
var result = ocr.DoOCR(image, new Rectangle());
foreach (Word word in result)
    MessageBox.Show("Confindece : " + word.Confidence + ", Word : " + word.Text);
Lee Taylor
  • 7,761
  • 16
  • 33
  • 49
user3788486
  • 88
  • 11

0 Answers0