I want to make a program that takes an image as input and outputs text. Now I know that I can use a neural network to turn an image of single character into that character. The difficult part is: given an image with text in it, how would I produce all the rectangles around each individual character? What method could I use to do it?
3 Answers
A basic approach is to make a histogram of black pixels. First: project all pixels on a line. The deep valleys in the histgram indicate separation between lines (try different angles if the paper might be tilted). Then, per line (or per page if you know the font is monospaced) project the pixels on a horizontal histogram. This will give you a strong indication of inter character spaces. As a minimum this gives you a value for the average character height and width that will help you in next steps.
After that, you need to take care of kerning (where characters overlap). Find the connected pixels, possibly by first doing dilatation or erosion on the image to compensate for scanning artifacts.
Depending on the quality of the scan image you may have to use more advanced techniques, but this will get you going.

- 2,200
- 27
- 36
-
This is very interesting because, while I think the method you describe will work quite well sometimes, it cannot learn? The neural network can be trained to get better at reading individual symbols but once it's perfect, using your ideas, I feel like perhaps it would be limited by this part of the procedure. Do you think that is the case or am I misjudging? – quanta Jul 08 '11 at 09:34
-
Ah, I slightly misread your question. The traditional approach is to do 1) image enhancement 2) segmentation 3) character recognition (using NN) 4) use context information (dictionary lookup or applying statistical data). You basically have the choice to do segmentation using NN or combine 2) and 3) using NN. The latter will be challenging but has potential advantages. If you want to apply NN to segmentation, you'll have to come up with good features. Using the histogram valleys might be one of them (I haven't done this myself so really cannot predict the outcome). – Emile Jul 08 '11 at 18:40
-
Congratulations! You've almost reinvented Hough transform. – polkovnikov.ph Mar 28 '16 at 02:09
This doesn't sound like artificial intelligence, it sounds like you're talking about OCR:
http://en.wikipedia.org/wiki/Optical_character_recognition
See google tesseract
http://code.google.com/p/tesseract-ocr/
EDIT The unedited question was asking about artificial intelligence.

- 3,849
- 3
- 24
- 30
-
2@quanta AI and OCR are not the same thing. By calling it AI, you're describing the wrong thing. – Raoul Jul 04 '11 at 08:25
To me the question per se does not seem clear.
As it talks about OCR will leave a couple of articles here that they may help (they help me at least):
Also as mentioned above tesseract is a good OCR open-source python library (the one that i personally use as well). Other approaches that you may take is through sklearn
You may also want to check this stackoverflow post.
I am also pretty sure that you can use researchgate to check for any papers out there (I found some, just not sure if this is what you need)
I think that the above generic answer suits the generic question.

- 342
- 4
- 12
-
1Hi, typically on SO if a question is not clear, then it's best not to answer it until it is made clear. Doubly so if you think there's already an answer posted that serves the question. – TylerH Feb 04 '20 at 21:39