2

My goal is separate web pages into parts (header, contacts, footer,...) with OpenCV Python. I converted the image of the web page to grayscale and used Canny. Here's the result:

Original Image to Grayscale + Canny Result

As you can see, the border of the parts are very clear with the human eyes to detect, and I think this problem is little for OpenCV, but I can't figure out how to export each part into a separate file (or at least get the lines' coordinates).

Here's my current code for Grayscale + Canny

import cv2
import numpy as np

img   = cv2.imread("image.png",cv2.IMREAD_GRAYSCALE)
edges = cv2.Canny(img, 5, 10)
Shai
  • 111,146
  • 38
  • 238
  • 371

3 Answers3

3

Use Hough lines and check for slope = 0. You will find this YouTube video very helpful and interesting.

double-beep
  • 5,031
  • 17
  • 33
  • 41
Harshith Thota
  • 856
  • 8
  • 20
3

All you need is to look at the statistics along the image's rows.
For instance, is you look at the mean intensity along the rows, boundary rows has mean close to 1.0.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • Please forgive me if I'm so stupid but how can I get these "statistics"? :) – Điền Bá Quan Jul 10 '17 at 12:47
  • 2
    @ĐiềnBáQuan the mean->also known as the average. in other words add all the pixels on a row, divided by the width (number of pixels in a row) and you get a value between 0-255 (or 0-1.0 if you a have floats or double), the white horizontal lines should have a value close to 255 (or 1.0) since most of the values will be white – api55 Jul 10 '17 at 12:55
  • Thank you very much! I have succeeded my goal. :) – Điền Bá Quan Jul 10 '17 at 13:26
  • 2
    @ĐiềnBáQuan please do not use the term "stupid" here: you are doing nice image processing work. you are definitely **NOT** stupid. – Shai Jul 10 '17 at 14:32
2

Here's some ways to get the white lines:

OpenCV's HoughLines and HoughLinesP are good starting points.

Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135