How to detect multiple blocks of text from an image of a document?

Question

I have images of Math question papers which have multiple questions per page. Example: Math questions image I want to use Python to extract the contents of each question separately and store them in a database table. From my research, I have a rough idea for my workflow: Pre-process image --> Find contours of each question --> Snip and send those individual images to pyTesseract --> Store the transcribed text.

I was very happy to find a great thread about a similar problem, but when I tried that approach on my image, the ROI that was identified covered the whole page. In other words, it identified all the questions as one block of text.

How do I make OpenCV recognize multiple ROIs within a page and draw bounding boxes? Is there something different to be done during the pre-processing?

Please suggest an approach - thanks so much!

Invert so text is white on black background. Use morphology close to connect all the text within each block only. Then get contours of that and their bounding box. Then crop the original image from those bounding box coordinates. — fmw42, Sep 18 '20 at 18:24

score 1 · Answer 1 · answered Sep 19 '20 at 17:33

First you need to convert the image into grayscale
Perform otsu'threshold which does better binarization in removing the background.
Specify structure shape and kernel size. Kernel size increases or decreases the area of the rectangle to be detected.
Applying dilation on the threshold image with the kernel when you dilated it gets thicker.
Finding contours
Looping through the identified contours Then the rectangular part is can be drawn using cv2.rectangle method

import cv2
img = cv2.imread("text.jpg") 
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 
blur = cv2.GaussianBlur(gray,(5,5),0)

ret, thresh1 = cv2.threshold(blur, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY_INV) 


rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18)) 

dilation = cv2.dilate(thresh1, rect_kernel, iterations = 1) 

contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL, 
                                                cv2.CHAIN_APPROX_NONE) 

for cnt in contours: 
    x, y, w, h = cv2.boundingRect(cnt) 
    
    # Drawing a rectangle on copied image 
    rect = cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2) 
    
cv2.imwrite('drawed.png', img)

Sample output iamge

How to detect multiple blocks of text from an image of a document?

1 Answers1