Finding contours with lines of text in OpenCV

Question

I am writing a text recognition program, and I have a problem with sorting contours. The program works fine for one line of text, but when it comes to the whole block of text my program doesn't detect the lines of text like 80% of the time. What would be a really efficient way to extract a line of text and then all of the other lines (one at a time)?

What I want to achieve:

score 25 · Accepted Answer · edited Jun 20 '20 at 09:12

There are a sequence of steps to achieve this:

Find the optimum threshold to binarize your image. I used Otsu threshold.
Find the suitable morphological operation that will form a single region along the horizontal direction. Choose a kernel that is larger in width than the height.
Draw bounding boxes over the resulting contours

UPDATE

Here is the implementation:

x = 'C:/Users/Desktop/text.jpg' 

img = cv2.imread(x)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)  

#--- performing Otsu threshold ---
ret,thresh1 = cv2.threshold(gray, 0, 255,cv2.THRESH_OTSU|cv2.THRESH_BINARY_INV)
cv2.imshow('thresh1', thresh1)

#--- choosing the right kernel
#--- kernel size of 3 rows (to join dots above letters 'i' and 'j')
#--- and 10 columns to join neighboring letters in words and neighboring words
rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (15, 3))
dilation = cv2.dilate(thresh1, rect_kernel, iterations = 1)
cv2.imshow('dilation', dilation)

#---Finding contours ---
_, contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

im2 = img.copy()
for cnt in contours:
        x, y, w, h = cv2.boundingRect(cnt)
        cv2.rectangle(im2, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('final', im2)

Pretty slick. I like that trick of dilating with a wide kernel to join letters and words. — bfris, Jun 10 '18 at 20:15
@Rudrashah You can perform OCR on the extracted portion to get the result in string format. — Jeru Luke, Apr 21 '20 at 06:50

Finding contours with lines of text in OpenCV

1 Answers1

UPDATE