I see you have mentioned that you do not want OCR. However, let me still go forward and post this solution here with EasyOCR.
import easyocr
import cv2 as cv
import numpy as np
import os
path = "menu.jpg"
assert os.path.exists(path)
#always a good idea to convert BGR to RGB when using OCR
img = cv.imread(path)
img = cv.cvtColor(img, cv.COLOR_BGR2RGB)
viz_img = np.copy(img)
#read the text
reader = easyocr.Reader(['en'])
text_data = reader.readtext(img, paragraph=True, x_ths=0.5) #in order ([box-coords], text, confidence)
print(text_data)
#visualize
for data in text_data:
# box, text
box, text = data
top_left, top_right, bottom_right, bottom_left = box
tl = [int(x) for x in top_left]
br = [int(x) for x in bottom_right]
cv.rectangle(viz_img, tl, br, (0, 255, 0), 4)
cv.putText(viz_img, text, br, cv.FONT_HERSHEY_SIMPLEX, 0.5, (255, 0, 0), 2)
cv.imwrite('viz_with_text.jpg', viz_img)
The documentation of EasyOCR is here.
Let me explain what I did.
- Read image and convert to RGB. From my own experience conversion to RGB gives better results in OCR.
- Setup EasyOCR reader. This reader has 3 methods i.e. detect for detection of text, recognize for recognition and readtext for detection and recognition pipeline.
- I have used the last method as it provides a functionality to merge vertical bounding boxes into paragraphs. This is what I have enabled with paragraph = True while calling the method. FYI, when you enable paragraph you won't get the confidence of the text recognized in the paragraph.
- You can get the box details of each section using the box-coordinates that is returned by the EasyOCR reader. You can check in the for loop in the code how I am parsing the result returned by the reader. FYI, when paragrah mode is disabled you get confidence of recognition as a third value.
For controlling the extent of merging boxes to form paragraph you need to play with the parameters x_ths for merging horizontally and y_ths for merging vertically.
Additional Information: If you see your text not being detected properly which can affect the output of the code you have to play with the parameters text_threshold, low_text and link_threshold.
Please refer to the EasyOCR documentation I have linked above for more details on the parameters.
The result on the image you have provided is as follows.
