Detecting Article zone of newspaper ( Text block )

Question

I have to detect all text zone of a newspaper basing on edges automatically (new ideas are welcome).

The result that i need is many TXT file containing each one an article. Take a look to this demonstration.

I'm assuming that you've done some research on this topic before coming here and have tried a thing or two out, right? What did you try, and what was the result? — Hovercraft Full Of Eels, Jun 10 '16 at 00:23
Yes yes man i did find nothing helping me to acheive the result ... all what i found was for c# :'( all what i need is that my application detect that there are 5 articls ins this page and make différence between them ... after take the title and the content text to save them in TXT file. — Algerowalid, Jun 10 '16 at 00:29
The implementation language is irrelevant: this is an algorithm problem that could be handled in any language. There are papers that cover this; I'd suggest either searching the web, or getting a membership to something like the ACM's Digital Library--most of the papers there are available for free elsewhere, but they're much easier to find in the ACM DL. — Dave Newton, Jun 10 '16 at 03:54

score 0 · Answer 1 · edited May 23 '17 at 10:34

0

This question is way too broad.

If you want a proper answer then you need to ask very specific questions and show us what you have tried. We don't even know what formats you have to work with so we cannot offer any real help other than guessing.

Having said that:

You probably want to look into using image recognition software.

A good API to look into is OpenCV: http://opencv.org/

Here is a tutorial on how to use OpenCV with Java: http://docs.opencv.org/3.0-last-rst/doc/tutorials/introduction/desktop_java/java_dev_intro.html

And here are two similar questions that may help you:

Finding location of rectangles in an image with OpenCV

How to recognize rectangles in this image?

edited May 23 '17 at 10:34

Community

1
1

answered Jun 10 '16 at 03:50

sorifiend

5,927
1
28
45

Thank you very much for tour answer and sorry about my question. What i want to do is: - taking a PDF file (exactly newspaper document) - Application will automatically recognize newspaper articles. - get each article and save it into XML document (this point is the easiest lol) So how can I use any library to recognize first, the Blocks text (newspaper article), and second to get Text and position. Thank you very much for the links, I'm goind to try them now. – Algerowalid Jun 10 '16 at 13:11

Detecting Article zone of newspaper ( Text block )

1 Answers1