Divide an image into tiles based on text structure in Python OpenCV

Question

I'm a beginner to computer vision and OpenCV, but I do have moderate experience with Python. I am trying to write a program that takes an image and divides the image into tiles based on the structural organization of the text. For example, given a menu like follow,

I want to use computer vision to identify the table formatting of the texts and divide it into tiles like follow

As of now, my purpose isn't to extract the text using OCR. All I need to do is identify the (hidden) table structure in the image and divide it into individual cells, and extract them as sub-images. Any approaches I can use?

Sorry I am really new to computer vision. Feel free to let me know if any other libraries from OpenCV are needed.

You could try dilation until the blocks coalesce and then find countours on that result. Then use the bounding boxes of the contours. — Pete, Jul 19 '23 at 17:40

score 1 · Accepted Answer · answered Jul 19 '23 at 18:25

I see you have mentioned that you do not want OCR. However, let me still go forward and post this solution here with EasyOCR.

import easyocr
import cv2 as cv
import numpy as np
import os

path = "menu.jpg"
assert os.path.exists(path)

#always a good idea to convert BGR to RGB when using OCR
img = cv.imread(path)
img = cv.cvtColor(img, cv.COLOR_BGR2RGB)

viz_img = np.copy(img)

#read the text
reader = easyocr.Reader(['en'])
text_data = reader.readtext(img, paragraph=True, x_ths=0.5)     #in order ([box-coords], text, confidence)

print(text_data)

#visualize
for data in text_data:
    # box, text
    box, text = data
    top_left, top_right, bottom_right, bottom_left = box

    tl = [int(x) for x in top_left]
    br = [int(x) for x in bottom_right]
    cv.rectangle(viz_img, tl, br, (0, 255, 0), 4)
    cv.putText(viz_img, text, br, cv.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)

cv.imwrite('viz_with_text.jpg', viz_img)

The documentation of EasyOCR is here.

Let me explain what I did.

Read image and convert to RGB. From my own experience conversion to RGB gives better results in OCR.
Setup EasyOCR reader. This reader has 3 methods i.e. detect for detection of text, recognize for recognition and readtext for detection and recognition pipeline.
I have used the last method as it provides a functionality to merge vertical bounding boxes into paragraphs. This is what I have enabled with paragraph = True while calling the method. FYI, when you enable paragraph you won't get the confidence of the text recognized in the paragraph.
You can get the box details of each section using the box-coordinates that is returned by the EasyOCR reader. You can check in the for loop in the code how I am parsing the result returned by the reader. FYI, when paragrah mode is disabled you get confidence of recognition as a third value.

For controlling the extent of merging boxes to form paragraph you need to play with the parameters x_ths for merging horizontally and y_ths for merging vertically.

Additional Information: If you see your text not being detected properly which can affect the output of the code you have to play with the parameters text_threshold, low_text and link_threshold.

Please refer to the EasyOCR documentation I have linked above for more details on the parameters.

The result on the image you have provided is as follows.

Thank you so much. My apology for any confusion from when I said "I don't want OCR". I simply meant that my main goal wasn't to get the exact text content, but your answer did solve what I needed :) When I tried to run the script, however, i got the error that "certificate verify failed: unable to get local issuer certificate". Do u have any ideas on resolving it? — Kevin Tommy, Jul 19 '23 at 19:06
Check this question https://stackoverflow.com/questions/52805115/certificate-verify-failed-unable-to-get-local-issuer-certificate. If problem still persists then download the English recognition model under '2nd generation' and the CRAFT text detection model from https://www.jaided.ai/easyocr/modelhub/. Put these in a directory and pass the path to the parameter 'model_storage_directory' in the Reader class initialization line i.e. reader = easyocr.Reader(['en'], model_storage_directory = "path to your directory"). — tintin98, Jul 20 '23 at 04:44
@HappyNigerian if my answer solves the question can you please accept the answer? — tintin98, Jul 23 '23 at 19:42

Divide an image into tiles based on text structure in Python OpenCV

1 Answers1