i want to detect text area from image as a preprocessing step for tesseract OCR engine, the engine works well when the input is text only but when the input image contains Nontext content it falls, so i want to detect only text content in image,any idea of how to do that will be helpful,thanks.
-
1I would go to an image processing solution. Try google for removing background techniques. – Israel Unterman Apr 18 '12 at 09:32
-
it is difficult to understand your problem without example image. Please upload image in imageshack.us and provide link here. – Abid Rahman K Apr 18 '12 at 18:02
-
ok, this is the link of a sample image i want to remove Non Text area from http://imageshack.us/photo/my-images/171/img0052ir.jpg/ but i think that tesseract manages all the process on it's own so we won't care about how the image looks like. – chostDevil Apr 19 '12 at 06:51
-
Why are u posting multiple questions? – vini Apr 19 '12 at 15:37
4 Answers

- 92,053
- 36
- 243
- 426
-
what about the Non Text region in the scanned image , (i.e. when i make an erosion on the input image, will the non text regions in the input image neglected? ) – chostDevil Apr 19 '12 at 17:11
-
When you have a bounding box you can extract it's content to a new image and forget about everything else that is not inside the box. For this task, search our forum for **Region Of Interest** or **ROI** in the OpenCV tag. – karlphillip Apr 19 '12 at 17:14
-
if there's any technique accurate than this please let me know, and thanks very much :) – chostDevil Apr 19 '12 at 18:08
-
i see in the above picture that these text is a one chunk(grouped in one area) will these technique works with separated groups of lines(i.e. business card)? – chostDevil Apr 19 '12 at 22:43
-
What you are trying to accomplish is not easy, Patrick, and this is not a copy/paste solution. It's great because it shares an approach on how to deal with your problem. But you still need to work on it and improve it in order to achieve your desired result. – karlphillip Apr 19 '12 at 22:48
-
sorry i didn't understand , could you tell me what's the difference between the algorithm listed above and the one which i'll need to remove non text area from business card. – chostDevil Apr 19 '12 at 23:32
-
The algorithm above was made to detect only one group of text in an image. You'll have to change it a little bit so it will detect more groups. – karlphillip Apr 19 '12 at 23:34
-
is the text inside a business card considered to be multiple groups? – chostDevil Apr 20 '12 at 09:09
-
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/10327/discussion-between-patrick-jones-and-karlphillip) – chostDevil Apr 20 '12 at 14:59
Well, I'm not well-experienced in image processing, but I hope I could help you with my theoretical approach.
In most cases, text is forming parallel, horisontal rows, where the space between rows will contail lots of background pixels. This could be utilized to solve this problem. So... if you compose every pixel columns in the image, you'll get a 1 pixel wide image as output. When the input image contains text, the output will be very likely to a periodic pattern, where dark areas are followed by brighter areas repeatedly. These "groups" of darker pixels will indicate the position of the text content, while the brighter "groups" will indicate the gaps between the individual rows. You'll probably find that the brighter areas will be much smaller that the others. Text is much more generic than any other picture element, so it should be easy to separate.
You have to implement a procedure to detect these periodic recurrences. Once the script can determine that the input picture has these characteristics, there's a high chance that it contains text. (However, this approach can't distinguish between actual text and simple horisontal stripes...)
For the next step, you must find a way to determine the borderies of the paragraphs, using the above mentioned method. I'm thinking about a pretty dummy algorithm, witch would divide the input image into smaller, narrow stripes (50-100 px), and it'd check these areas separately. Then, it would compare these results to build a map of the possible areas filled with text. This method wouldn't be so accurate, but it probably doesn't bother the OCR system.
And finally, you need to use the text-map to run the OCR on the desired locations only.
On the other side, this method would fail if the input text is rotated more than ~3-5 degrees. There's another backdraw, beacuse if you have only a few rows, then your pattern-search will be very unreliable. More rows, more accuracy...
Regards, G.

- 2,881
- 2
- 23
- 28
I am new to stackoverflow.com, but I wrote an answer to a question similar to this one which may be useful to any readers who share this question. Whether or not the question is actually a duplicate, since this one was first, I'll leave up to others. If I should copy and paste that answer here, let me know. I also found this question first on google rather than the one i answered so this may benefit more people with a link. Especially since it provides different ways of going about getting text areas. For me, when I looked up this question, it did not fit my problem case.
In the Current time, the best way to detect the text is by using EAST (An Efficient and Accurate Scene Text Detector)
The EAST pipeline is capable of predicting words and lines of text at arbitrary orientations on 720p images, and furthermore, can run at 13 FPS, according to the authors.
EAST quick start tutorial can be found here
EAST paper can be found here

- 5,533
- 3
- 22
- 45