1

I have a bunch of images likeTable Sample

What would be the good way to extract just the table structure from the image? I'm only interested extracting the straight lines.

I have been toying around with OpenCV Finding Contours code sample and the results are quite promising. I'm just wondering if there is maybe a better way?

chhenning
  • 2,017
  • 3
  • 26
  • 44
  • May you could try also [HoughLineTransform](http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_houghlines/py_houghlines.html), get all **horizontal lines** and get ROI based on minimum y and max x coordinates (basically two diagonal corners of the ROI - rectangle here) – Rick M. Jul 13 '17 at 18:18
  • I have tried http://docs.opencv.org/2.4/doc/tutorials/imgproc/imgtrans/hough_lines/hough_lines.html but the result is pretty bad. – chhenning Jul 13 '17 at 18:38
  • Ok that is strange, so if I understand correctly, you want to extract just the table in between right? – Rick M. Jul 14 '17 at 08:04
  • I just like to extract the grid of horizontal and vertical lines. – chhenning Jul 14 '17 at 14:20
  • In that case you could also try [CCA](http://docs.opencv.org/3.1.0/d3/dc0/group__imgproc__shape.html#gae57b028a2b2ca327227c2399a9d53241) – Rick M. Jul 14 '17 at 14:32
  • This looks like worth a try! https://stackoverflow.com/questions/10196198/how-to-remove-convexity-defects-in-a-sudoku-square/10226971#10226971 – chhenning Mar 13 '18 at 20:44

2 Answers2

6

OpenCV has a nice way to detect line segments. Here is a code snippet in python:

import math
import numpy as np
import cv2

img = cv2.imread('page2.png')
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)

lsd = cv2.createLineSegmentDetector(0)
dlines = lsd.detect(gray)

for dline in dlines[0]:
    x0 = int(round(dline[0][0]))
    y0 = int(round(dline[0][1]))
    x1 = int(round(dline[0][2]))
    y1 = int(round(dline[0][3]))
    cv2.line(img, (x0, y0), (x1,y1), 255, 1, cv2.LINE_AA)

    # print line segment length
    a = (x0-x1) * (x0-x1)
    b = (y0-y1) * (y0-y1)
    c = a + b
    print(math.sqrt(c))

cv2.imwrite('page2_lines.png', img)
chhenning
  • 2,017
  • 3
  • 26
  • 44
1

Kindly go through my Github repository Code for table extraction

The developed code detect table and extract out information by keeping the spatial coordinates intact. enter image description here

The code detects lines from tables as shown in an image below. I hope it solves your problem. enter image description here

The extracted output in terms of a table is shown below.enter image description here

Sunil Sharma
  • 249
  • 3
  • 8