I'm working with images from which I would like to take parts out and make one new image. I can make use of ImageMagick or OpenCV. Here is a sample image:
From this image I would like to take out the title, two annotated texts (one in circle one in rectangle), and the text from bottom.
So, the final image would have: Image Title, Annotated Text1, Annotated TExt, and This is some test. These parts of the image don't have to be in any particular order in the new image.
Questions
- What kind of strategy can I use to do this?
- Will hough or canny help?
- I'm thinking that since the parts of the image I want back are all text, maybe hough line can detect the straight lines and then I crop out those parts of the images...
- My main goal is to extract text so I can send it to an OCR
I've tried to erode the image and came up with this:
My Strategy
Following is my strategy to only keep parts of the image with white background and text. However, I'm not sure if this is doable with OpenCV...
There will be different ROI's in the image
- there will always be white background on top of the image, lets call this space title. So I crop out the rectangle part on top of the image and save it as a separate image
- there will always be white background at bottom of the image, lets call this body. So I crop out the rectangle part at bottom of the image and save it as a separate image
- there will be some text on top of the image, lets call this annotated text. This will be in squares or circles. I can use technique mentioned in this answer to crop out those parts of the image and save them as a separate image.