5

This is my first post on Stack Overflow, so I am sorry if the problem isn't defined enough.

I am currently working on extracting table data from images and I need a way to dilate the text only in a vertical direction so that I can get clear column representation that will be used for further segmentation.

After removing horizontal and vertical lines and transforming the image bitwise, I am at this stage:

Current state after dilation and line extraction

The ideal goal for this problem would be:

The goal

Is there a method or an algorithm that would be helpful in my case?

HansHirse
  • 18,010
  • 10
  • 38
  • 67
  • 1
    When using [`cv2.dilate`](https://docs.opencv.org/4.1.1/d4/d86/group__imgproc__filter.html#ga4ff0f3318642c4f469d0e11f242f3b6c), you can set up a custom `kernel`. Use a `3 x 1` (rows x columns) white rectangle here, and set `iterations` large enough. – HansHirse Nov 26 '19 at 11:42
  • Maybe you can dilate by a vertical kernel with width of 1. I don't know if it works, but I think that should dilate only on vertical direction. – dome Nov 26 '19 at 11:43
  • Thank You, I will try that. – Filip Jurković Nov 26 '19 at 11:46
  • You can also set a kernel with a reasonable height and perform only one iteration (see my answer below). – ndrplz Nov 26 '19 at 12:04

2 Answers2

6

You can just call cv2.dilate with the appropriate structuring element.

import cv2

pre_img = cv2.imread('image.jpg', cv2.IMREAD_GRAYSCALE)
h, w = pre_img.shape

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, ksize=(1, 2 * h))

dilated = cv2.dilate(pre_img, kernel)

cv2.imshow('input', pre_img)
cv2.imshow('output', dilated)
cv2.waitKey(0)

Input input image

Output output image

To visualize better what's happening:

blended = (pre_img.astype(float) + dilated.astype(float)) / 2
cv2.imshow('blended', blended.astype(np.uint8))
cv2.waitKey(0)

Blended image blend

ndrplz
  • 1,584
  • 12
  • 16
3

It looks like you don’t want a dilation, but a maximum projection. For each column, check to see if any pixel is set. Use numpy.any for that:

result = np.any(image, axis=0)
Cris Luengo
  • 55,762
  • 10
  • 62
  • 120
  • This is much more efficient than using a convolution approach. Maybe add a `.reshape(image.shape)` at the end to get the 2D image, depending on the use case. – Rob May 26 '21 at 08:41
  • @Rob `np.tile` would be more suited to convert the projection back into a 2D image. Might be necessary or not depending on subsequent operations. Implicit broadcasting might make it unnecessary. – Cris Luengo May 26 '21 at 14:07
  • You are right, my mistake. It should be `np.tile(np.any(image, axis=0), (image.shape[0], 1))`. I agree that it is usually not needed. – Rob Jun 01 '21 at 13:46