image recognition: a box and randomly placed text

Question

I was wondering if anyone would give me pointers to image rec packages that would help me recognize "text" (not OCR, just something that looks like text) and a black box frame. So, suppose:

text
+----------+
|          |
|   text1  |
|          |
|          |
+----------+
     text

How do I recognize that "text" boxes are text, and that, say, text1 is inside the box?

Apologies for the vague question... I wouldn't know where to start. This is not homework, btw.

Are you starting with a screen shot of a Windows application, and you're trying to identify text boxes and the text within them? — MusiGenesis, Oct 25 '09 at 15:10
no, much simpler... I have a box and text in a gif file. I need to recognize (if there is) a box and random text. — Dervin Thunk, Oct 25 '09 at 15:12
@Dervin so would some ASCII (http://users.physik.fu-berlin.de/~goerz/blog/wp-content/uploads/2008/09/ascii.gif) be a good example? Because if so, this is not trivial — peter.murray.rust, Oct 25 '09 at 15:41
@Dervin your actual example is very difficult. We do this sort of work on scientific graphs. Note that unlike your original post there is NO text inside the box - it is all outside. There is a graph INSIDE the box. Some of the text is vertical. The top characters are VERY difficult as they are seriously broken up. The problem is very difficult unless every single picture uses exactly the same font in the same location. This problem has taken my collaborators MONTHS and it will never give 100 percen. — peter.murray.rust, Oct 25 '09 at 16:15

peter.murray.rust · Accepted Answer · 2009-10-25T15:06:48.813

[This is of interest to us.] I am assuming your input is effectively a bitmap - a rectangular matrix of pixels. The first question is whether it is aligned with the axes - if it's been scanned it's probably not. You may need deskewing algorithms (rather dated but it's a useful start: http://www.eecs.berkeley.edu/~fateman/kathey/node11.html)

The classic line detection is the Hough transform (http://en.wikipedia.org/wiki/Hough_transform) though our current collaborators do better than this for simple boxes and project pixels onto different viewpoints - similar to tomography. Rotate the image and count the density/histogram of points on the projection lines. For simple boxes that gives a clear signal.

For the text I suspect you either have to have a set of likely fonts or to use machine learning. In the latter you have to devise features and then select a series of images that are classified by humans as text and not-text. Your algorithm (and there are many, neural nets, maximum entropy, etc.) are then trained against these.

The quality of the pixel map makes a great deal of difference. Documents 20 years ago and much harder than bitmaps of documents created though drawing programs and dumped as PDF (of course if you can interpret text in PDF that helps a good deal.)

My documents are simple... they are gif images, so they are clean. — Dervin Thunk, Oct 25 '09 at 15:13
@Dervin GIF is simply a transfer format for pixels. they could hold very messy text (e.g. the captchas in SO) or fairly clean text - e.g. the fonts in SO itself. But many images are not clean when analysed in detail as they may include antialiasing — peter.murray.rust, Oct 25 '09 at 15:32
Peter, the image would be closer to this: http://images.freshmeat.net/editorials/r_intro/images/line-graph-1.jpg — Dervin Thunk, Oct 25 '09 at 15:48
Thanks, Peter. I agree it will never be 100%, so there will always be some manual intervention. — Dervin Thunk, Oct 25 '09 at 16:29

score 1 · Answer 2 · answered Oct 25 '09 at 14:47

1

You can apply any border detection algorithm to detect box. and since color of text is different form the color of background you can use even linear search to find black pixels of 'text'. I may be wrong, sorry about that.

answered Oct 25 '09 at 14:47

Ilya Khaprov

2,546
14
23

score 0 · Answer 3 · answered Oct 25 '09 at 15:23

0

A very simple algorithm would to scan left-to-right and top-to-bottom, looking for the three black pixels that make up an upper-left corner of a box (and then continuing to scan for the three pixels that would make up the matching lower-right corner). Once you've identified each box in the image in this way, you could scan the inner portion and assume that any non-white pixels mean there is some text in the box. Of course, this would not differentiate between text and images inside the box, but that would be a much more difficult problem anyway.

answered Oct 25 '09 at 15:23

MusiGenesis

74,184
40
190
334

sorry about my naive question, but what happens if in your doc you have a T at a small y coordinate? wouldn't that be confused with the left corner? – Dervin Thunk Oct 25 '09 at 15:38
You cannot assume there are exactly 3 pixels - it depends on the line width, registeration with the rasterisation program , antialisaing and a lot more. – peter.murray.rust Oct 25 '09 at 16:17
@Dervin: you could rule out a "T" by checking the pixel to the left, and you could rule out a "+" by checking to the left and above, but all of this assumes a relatively simple image. My algorithms here wouldn't work very well with the sample image you posted below peter's comment. It wouldn't pick up the lower-right corner of the graph's box, it would falsely recognize the upper-left of the "5"s and the sideways "D" in "DJIA" as corners, etc. – MusiGenesis Oct 25 '09 at 18:19
@Dervin: by the way, your sample graph in your comment to peter's answer caused me actual physical pain. This answer is why: http://stackoverflow.com/questions/1538235/what-problems-have-you-solved-using-genetic-algorithms-genetic-programming/1538464#1538464 – MusiGenesis Oct 25 '09 at 21:23

image recognition: a box and randomly placed text

3 Answers3