0

I have image of a pdf page for which I want to separate paragraphs by drawing lines between them. The input is this: Sample input image

The output I would want is this: Output image

What I have done so far using opencv is converting the image to binary, applying gaussian blur and dilating the image to get the following output: enter image description here

The code is as follows:

img_path = r"C:\test\Samsung-file.JPG"
img_org = cv2.imread(img_path)

gray = cv2.cvtColor(img_org, cv2.COLOR_BGR2GRAY)


blur = cv2.GaussianBlur(gray, (3,3), 0)
thresh = cv2.adaptiveThreshold(blur,255,cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY_INV,11,30)

# Dilate to combine adjacent text contours
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,1))
dilate = cv2.dilate(thresh, kernel, iterations=5)

Is there any method with which the lines can be drawn by identifying the space between white blocks(after dilation) ? Any help would be greatly appreciated!

Sandeep
  • 712
  • 1
  • 10
  • 22

1 Answers1

3

Here's a possible solution: First, try to get a segmentation mask of the text. Apply an aggressive dilation operation with a big, nice, rectangular structuring element. The idea is to get big blocks of text, so we can clearly see the separating lines between them. Next, reduce the image to a MAX (255) column, where every value is the maximum pixel value of each of the dilated images rows. If you invert the reduced image and find contours, you will get the space between the blocks of text you are looking for. Finally, get the average or middle point of the white spaces and draw a line at this vertical height.

Let's see the code:

# imports:
import cv2
import numpy as np

# Set image path
imagePath = "C://opencvImages//"
imageName = "PQZUL.jpg"

# Read image:
inputImage = cv2.imread(imagePath + imageName)
# Store a copy for results:
inputCopy = inputImage.copy()

# Convert BGR to grayscale:
grayInput = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)

# Threshold via Otsu
_, binaryImage = cv2.threshold(grayInput, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)

# Set kernel (structuring element) size:
kernelSize = (9, 9)

# Set operation iterations:
opIterations = 2

# Get the structuring element:
morphKernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)

# Perform Dilate:
dilateImage = cv2.morphologyEx(binaryImage, cv2.MORPH_DILATE, morphKernel, 
                               None, None, opIterations, cv2.BORDER_REFLECT101)

This set of operations gets you a nice segmentation mask, like this:

Now, reduce this image to a MAX column. This is the vertical reduction of the image:

# Reduce matrix to a n row x 1 columns matrix:
reducedImage = cv2.reduce(dilateImage, 1, cv2.REDUCE_MAX)

# Invert the reduced image:
reducedImage = 255 - reducedImage

This is the result - It is hard to see here, but the image has been reduced to just a column, where each value is the maximum pixel intensity value found for that particular image row:

Each white section is the "jump" of each text block to a new paragraph - these are the blobs (or contours) we are looking for. Let's find them and compute their bounding boxes:

# Find the big contours/blobs on the filtered image:
contours, hierarchy = cv2.findContours(mask, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)

# Store the poly approximation and bound
contoursPoly = [None] * len(contours)
separatingLines = [ ]

# We need some dimensions of the original image:
imageHeight = inputCopy.shape[0]
imageWidth = inputCopy.shape[1]

# Look for the outer bounding boxes:
for i, c in enumerate(contours):

    # Approximate the contour to a polygon:
    contoursPoly = cv2.approxPolyDP(c, 3, True)

    # Convert the polygon to a bounding rectangle:
    boundRect = cv2.boundingRect(contoursPoly)

    # Get the bounding rect's data:
    [x,y,w,h] = boundRect

So far we have the coordinates of the bounding boxes, we need to find a way to get the middle point of each vertical coordinate. There's a couple of solutions to this, I decide to just get the height of the bounding box and compute its middle coordinate. Here, still inside of the for loop:

    # Calculate line middle (vertical) coordinate,
    # Start point and end point:
    lineCenter = y + (0.5 * h)
    startPoint = (0,int(lineCenter))
    endPoint =  (int(imageWidth),int(lineCenter))

    # Store start and end points in list:
    separatingLines.append((startPoint, endPoint))

    # Draw the line:
    color = (0, 255, 0)
    cv2.line(inputCopy, startPoint, endPoint, color, 2)

    # Show the image:
    cv2.imshow("inputCopy", inputCopy)
    cv2.waitKey(0)

I've stored the start and end points in the separatingLines list, so you can retrieve the data latter, if you need to. This is the result:

stateMachine
  • 5,227
  • 4
  • 13
  • 29
  • Thank you, This works well with the current document. For the documents with rotation I will need to draw horizontal lines, will the same work with minimum modification, or will it be a whole different approach. Any ideas? – Sandeep Mar 08 '21 at 09:57
  • @Sandeep It depends on the rotation angle. Due to the _image-to-column_ reduction I'm implementing here, the space between paragraphs could not be properly detected past a minimum rotation angle. What is needed is a perspective correction method. The most robust approach being [perspective warping](https://stackoverflow.com/questions/22656698/perspective-correction-in-opencv-using-python). If the perspective just exhibits 2D rotation, a compensation via [rotated rectangle](https://stackoverflow.com/questions/18207181/opencv-python-draw-minarearect-rotatedrect-not-implemented) might suffice. – stateMachine Mar 08 '21 at 21:25