Here's a possible solution: First, try to get a segmentation mask of the text. Apply an aggressive dilation
operation with a big, nice, rectangular structuring element
. The idea is to get big blocks of text, so we can clearly see the separating lines between them. Next, reduce
the image to a MAX
(255
) column, where every value is the maximum pixel value of each of the dilated images rows. If you invert the reduced image and find contours
, you will get the space between the blocks of text you are looking for. Finally, get the average or middle point of the white spaces and draw a line
at this vertical height.
Let's see the code:
# imports:
import cv2
import numpy as np
# Set image path
imagePath = "C://opencvImages//"
imageName = "PQZUL.jpg"
# Read image:
inputImage = cv2.imread(imagePath + imageName)
# Store a copy for results:
inputCopy = inputImage.copy()
# Convert BGR to grayscale:
grayInput = cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
# Threshold via Otsu
_, binaryImage = cv2.threshold(grayInput, 0, 255, cv2.THRESH_BINARY_INV+cv2.THRESH_OTSU)
# Set kernel (structuring element) size:
kernelSize = (9, 9)
# Set operation iterations:
opIterations = 2
# Get the structuring element:
morphKernel = cv2.getStructuringElement(cv2.MORPH_RECT, kernelSize)
# Perform Dilate:
dilateImage = cv2.morphologyEx(binaryImage, cv2.MORPH_DILATE, morphKernel,
None, None, opIterations, cv2.BORDER_REFLECT101)
This set of operations gets you a nice segmentation mask, like this:
Now, reduce this image to a MAX
column. This is the vertical reduction of the image:
# Reduce matrix to a n row x 1 columns matrix:
reducedImage = cv2.reduce(dilateImage, 1, cv2.REDUCE_MAX)
# Invert the reduced image:
reducedImage = 255 - reducedImage
This is the result - It is hard to see here, but the image has been reduced to just a column, where each value is the maximum pixel intensity value found for that particular image row:
Each white section is the "jump" of each text block to a new paragraph - these are the blobs (or contours
) we are looking for. Let's find them and compute their bounding boxes
:
# Find the big contours/blobs on the filtered image:
contours, hierarchy = cv2.findContours(mask, cv2.RETR_CCOMP, cv2.CHAIN_APPROX_SIMPLE)
# Store the poly approximation and bound
contoursPoly = [None] * len(contours)
separatingLines = [ ]
# We need some dimensions of the original image:
imageHeight = inputCopy.shape[0]
imageWidth = inputCopy.shape[1]
# Look for the outer bounding boxes:
for i, c in enumerate(contours):
# Approximate the contour to a polygon:
contoursPoly = cv2.approxPolyDP(c, 3, True)
# Convert the polygon to a bounding rectangle:
boundRect = cv2.boundingRect(contoursPoly)
# Get the bounding rect's data:
[x,y,w,h] = boundRect
So far we have the coordinates of the bounding boxes
, we need to find a way to get the middle point of each vertical coordinate. There's a couple of solutions to this, I decide to just get the height
of the bounding box and compute its middle coordinate. Here, still inside of the for
loop:
# Calculate line middle (vertical) coordinate,
# Start point and end point:
lineCenter = y + (0.5 * h)
startPoint = (0,int(lineCenter))
endPoint = (int(imageWidth),int(lineCenter))
# Store start and end points in list:
separatingLines.append((startPoint, endPoint))
# Draw the line:
color = (0, 255, 0)
cv2.line(inputCopy, startPoint, endPoint, color, 2)
# Show the image:
cv2.imshow("inputCopy", inputCopy)
cv2.waitKey(0)
I've stored the start and end points in the separatingLines
list, so you can retrieve the data latter, if you need to. This is the result:
