1

I am trying to extract the circled word from an image containing several different words.

For example, in this image:

Words

The "MAMBAHUNT" word should be extracted because it's circled.

Extracted word

My strategy so far is to find the straight lines in the image. Once I have the straight lines, I can get the intersections of them to find the corners and extract the desired image.

The way I've been trying to get the straight lines is by looping through each pixel and finding where several of them in a row have the same color. However, that gives false positives because some of the words would meet this criterion.

Is there a better way to find straight lines in an image using PHP? Or a different strategy for extracting the circled word?

dangson
  • 45
  • 4
  • there are a couple of OCR libs for PHP, but I have never tried them and don't know if they are good, look for example at http://ocrsdk.com/documentation/quick-start/text-fields/?utm_source=stackoverflow.com&utm_medium=comment&utm_campaign=smm – herrjeh42 Mar 27 '13 at 06:55
  • I'm using Tesseract for OCR and it's actually quite good. However, I'm trying to get the circled region before running it through OCR. Otherwise all the other words would come through. – dangson Mar 27 '13 at 13:09
  • This posting mentions a few algorithms to detect lines http://stackoverflow.com/questions/11307219/recognize-pattern-in-images, there is another once specific for php: http://stackoverflow.com/questions/4142271/how-to-detect-a-partial-vertical-horizontal-line-in-a-image – herrjeh42 Mar 27 '13 at 14:16
  • And what about this one? http://stackoverflow.com/questions/8881326/hough-transform-with-php – herrjeh42 Mar 27 '13 at 14:18

1 Answers1

2

Trying to defy a captcha? :)

You can use the same algorithm to find all white lines in the image and check each gray line for whether it is adjacent to a white line. When it's not; it is a false positive. Not incredible efficient though.

Ronald Paul
  • 421
  • 2
  • 5