0

In an effort to see if it is possible to easily break very simple CAPTCHAs, I am attempting to write a program (as simple and small as possible). This program, which I hope to write in C++, should do the following:

  1. Make a partial screenshot of a known area of the screen. (Assuming that the CAPTCHA is always in the exact same place - for instance: Pixels 500-600 x, pixels 300-400 y).

  2. Automatically dissect the CAPTCHA into individual letters. (The CAPTCHAS I will create for testing will all have only a few white letters, always on a black background, spaced well apart, to make things easy on me.)

  3. The program then compares the "cut" letters against an array of "known" images of letters (which look similar to the letters used in the CAPTCHA), which contains 26 elements, each holding an image of a single letter of the English alphabet.

  4. The program takes the letter associates with the image that the comparison mapped to, and sends that key to the console (via std::cout)

My question is: Is there an easy-to-use library (I am only a beginner at programming), which can handle tasks 1-3 (The 4. is rather easy)? Especially the third point is something I haven't found pretty much anything worthwhile on. What would be ideal is if this library had a "score" function, using a float to indicate how similar the images are. Then, the one with the highest score is the best hit. (I.e: 100.0 means the images are identical, 29.56 means they are very different, etc.)

John Torwalds
  • 11
  • 1
  • 4
  • The captcha you are describing manages to be simpler than the one "broke" at http://stackoverflow.com/a/13665185/1832154. The code present there is very simple, you should have no trouble mapping it to c++ by using opencv or even other less-huge libraries. – mmgp Feb 17 '13 at 15:16
  • 1
    I'm pretty sure that taking a screenshot is fairly system-dependent. Which system are you using? – Xymostech Feb 17 '13 at 15:18
  • Ah, yes, there is the screenshot. I completely ignored it, because it is a fully different task from the one of solving the problem and only a matter of calling programs given a platform. – mmgp Feb 17 '13 at 15:20
  • I am using Windows 7. Pardon me for forgetting to mention this. – John Torwalds Feb 17 '13 at 16:17

1 Answers1

0

A good library for this job is OpenCV. http://opencv.org

OpenCV has all the necessary low-level imge processing tools to segment the different elements of the captcha. Then you can use its template matching module.

You could even try to detect letters directly without the preprocessing. It will be slower, but the captcha image is typically so small, that it should rarely matter. See: http://docs.opencv.org/modules/imgproc/doc/object_detection.html#cv2.matchTemplate

For some tutorials to get into the library see: http://docs.opencv.org/doc/tutorials/tutorials.html

ypnos
  • 50,202
  • 14
  • 95
  • 141
  • 1
    Thank you very much indeed. This looks very much like anything I could ever dream of. I've marked this as the accepted answer. Kudos to you. – John Torwalds Feb 17 '13 at 16:15