How to remove rectangle shapes from image, keeping text, in Python3?

Question

I am trying to extract the text from flowcharts and decision trees. If I use the image with original boxes/shapes, the text region detection is poor. Is there any way to remove these shapes (keeping the text)?

you can use HoughLineDetector to detect all the straight lines, then fill the lines with the background color. — ZdaR, Apr 24 '18 at 03:37
I would probably use [shape detection](https://stackoverflow.com/a/11427501/6225741), then run OCR on each ROI? — Nayfe, Apr 24 '18 at 07:35
@Nayfe Some texts are outside the boxes, a shape detection misses those regions. I will update the photo. — Bade, Apr 24 '18 at 11:59

score 1 · Answer 1 · answered Apr 25 '18 at 06:59

1

You could use connectedComponentsWithStats(), you will have single component for the chart lines, then just remove that component from the image.

answered Apr 25 '18 at 06:59

fireant

14,080
4
39
48

1

Could you please elaborate a little bit? There is almost no documentation available on connectedComponentsWithStats for Python3. If Python3 is not your preferred language, then maybe you can write the steps that you envision will help removing rectangles from the above image. – Bade May 07 '18 at 20:13

How to remove rectangle shapes from image, keeping text, in Python3?

1 Answers1