How to transcript text from image in the highlighted areas?

Question

How can I transcript the text from the highlighted areas from the following image with Tesseract in Python?

The question should be how to crop those area from the image. But this can be done in many ways and you need to specific use cases. For example what kind of images other than this one look like? — Natthaphon Hongcharoen, May 26 '21 at 20:13
Or how did you select these words in the first place? Why not select other words? — Natthaphon Hongcharoen, May 26 '21 at 20:15
It's only an example. In case, all images are similar and have the same highlighted area. I only want to know, how can transcript text from these areas? — Bro From Space, May 26 '21 at 20:22
Do you mean something like `pytesseract.image_to_string(image[100:350, 50: 100])`? — Natthaphon Hongcharoen, May 26 '21 at 20:24
It'll work with other images that all the texts you want are in the "exact same place". But if they move outside the red box it won't work. — Natthaphon Hongcharoen, May 26 '21 at 20:28
Otherwise you need to specific how did you choose those words. For example, the first word in the first line, etc. — Natthaphon Hongcharoen, May 26 '21 at 20:29
Yes. Right now all images will have their red rectangular at the same positions. — Bro From Space, May 26 '21 at 20:33

score 1 · Accepted Answer · answered May 27 '21 at 09:09

Assuming you have a distinct color for the highlighted areas, which isn't present in the remaining image – like the prominent red color for the highlighting in your example – you can use color thresholding using the HSV color space incorporating cv2.inRange.

Therefore, you set up proper lower and upper limits for hue, saturation, and value. In the given example, we're detecting red-ish colors. So, in general, we would need two sets of limits, since red-ish colors are at the 0°/180° "turnaround" of the hue cylinder. To overcome that, and only use one set of limits, we shift the obtained hue channel by 90°, and take the modulo of 180°. Also, we have high satured, and quite bright red-ish colors, so we might look at saturation levels above 80 %, and value levels above 50 %. We get such a mask:

Last thing to do is to obtain the contours from the generated mask, get the corresponding bounding rectangles, and run pytesseract on the content (grayscaled, thresholded using Otsu for better OCR performance). My suggestion would be to also use the -psm 6 option here.

Here's the full code including the results:

import cv2
import numpy as np
import pytesseract

# Read image
img = cv2.imread('E5PY2.jpg')

# Convert to HSV color space, and split channels
h, s, v = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2HSV))

# Shift hue channel to detect red area using only one range
h_2 = ((h.astype(int) + 90) % 180).astype(h.dtype)

# Mask highlighted boxes using color thresholding
lower = np.array([ 70, int(0.80 * 255), int(0.50 * 255)])
upper = np.array([110, int(1.00 * 255), int(1.00 * 255)])
highlighted = cv2.inRange(cv2.merge([h_2, s, v]), lower, upper)

# Find contours w.r.t. the OpenCV version; retrieve bounding rectangles
cnts = cv2.findContours(highlighted, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
rects = [cv2.boundingRect(cnt) for cnt in cnts]

# Iterate bounding boxes, and OCR
for x, y, w, h in rects:

    # Grayscale, and threshold using Otsu
    work = cv2.cvtColor(img[y:y+h, x:x+w], cv2.COLOR_BGR2GRAY)
    work = cv2.threshold(work, 0, 255, cv2.THRESH_OTSU)[1]

    # Pytesseract with -psm 6
    text = pytesseract.image_to_string(work, config='--psm 6')\
        .replace('\n', '').replace('\f', '')
    print('X: {}, Y: {}, Text: {}'.format(x, y, text))
    # X: 468, Y: 1574, Text: START MEDITATING
    # X: 332, Y: 1230, Text: Well done. By signing up, you’ve taken your first
    # X: 358, Y: 182, Text: Welcome

Caveat: I use a special version of Tesseract from the Mannheim University Library.

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.1
NumPy:         1.20.3
OpenCV:        4.5.2
pytesseract:   5.0.0-alpha.20201127
----------------------------------------

I know. The proposed approach finds those borders, and gets the area within to do the OCR there. I have to ask: Did you read and understand the presented code? If the results (last three lines, comments) are not what you expect, then you should make clear in your question, what your actual goal is. — HansHirse, May 27 '21 at 11:02
Hi. Could you please a bit more information about this code ------------------h_2 = ((h.astype(int) + 90) % 180).astype(h.dtype) ------------------------work = cv2.cvtColor(img[y:y+h, x:x+w], cv2.COLOR_BGR2GRAY) work = cv2.threshold(work, 0, 255, cv2.THRESH_OTSU)[1] — Oleg, May 30 '21 at 17:46
@Oleg 1) Please have a look at the linked Wikipedia article on the HSV color space. Red-ish colors can be found at hue values from 0° to maybe 20° and from maybe 340° to 360°. So, you'd need two sets of boundaries to detect red-ish colors, thus two `cv2.inRange` calls. To simplify that, I shift all hue values by 90°, such that red-ish colors can be found from 70° to 110°. 2) That's slicing (cropping to ROI), and color conversion to grayscale, needed for the following thresholding. 3) That's thresholding using [Otsu's method](https://en.wikipedia.org/wiki/Otsu%27s_method). — HansHirse, May 31 '21 at 10:12

Natthaphon Hongcharoen · Answer 2 · 2021-05-26T21:04:22.960

From the top to bottom. The boxes are approximately at (x1, y1, x2, y2)

0.2564, 0.1070, 0.6293, 0.166
0.2377, 0.6826, 0.7645, 0.703
0.331, 0.88, 0.6713, 0.913

In relative to width and height. The full code would be like

import cv2
import pytesseract

image = cv2.imread('E5PY2.jpg')
coords = [[0.2564, 0.1070, 0.6293, 0.166],
          [0.2377, 0.6826, 0.7645, 0.703],
          [0.331, 0.88, 0.6713, 0.913]]
h, w, c = image.shape
for idx, (x1, y1, x2, y2) in enumerate(coords):
    x1 = int(x1 * w)
    x2 = int(x2 * w)
    y1 = int(y1 * h)
    y2 = int(y2 * h)
    print(pytesseract.image_to_string(image[y1:y2, x1:x2]))

Thanks, but how did you find the coordinates? – Bro From Space May 27 '21 at 07:28 — Bro From Space, May 27 '21 at 07:28

How to transcript text from image in the highlighted areas?

2 Answers2