0

How can I transcript the text from the highlighted areas from the following image with Tesseract in Python?

Input image

HansHirse
  • 18,010
  • 10
  • 38
  • 67

2 Answers2

1

Assuming you have a distinct color for the highlighted areas, which isn't present in the remaining image – like the prominent red color for the highlighting in your example – you can use color thresholding using the HSV color space incorporating cv2.inRange.

Therefore, you set up proper lower and upper limits for hue, saturation, and value. In the given example, we're detecting red-ish colors. So, in general, we would need two sets of limits, since red-ish colors are at the 0°/180° "turnaround" of the hue cylinder. To overcome that, and only use one set of limits, we shift the obtained hue channel by 90°, and take the modulo of 180°. Also, we have high satured, and quite bright red-ish colors, so we might look at saturation levels above 80 %, and value levels above 50 %. We get such a mask:

Mask

Last thing to do is to obtain the contours from the generated mask, get the corresponding bounding rectangles, and run pytesseract on the content (grayscaled, thresholded using Otsu for better OCR performance). My suggestion would be to also use the -psm 6 option here.

Here's the full code including the results:

import cv2
import numpy as np
import pytesseract

# Read image
img = cv2.imread('E5PY2.jpg')

# Convert to HSV color space, and split channels
h, s, v = cv2.split(cv2.cvtColor(img, cv2.COLOR_BGR2HSV))

# Shift hue channel to detect red area using only one range
h_2 = ((h.astype(int) + 90) % 180).astype(h.dtype)

# Mask highlighted boxes using color thresholding
lower = np.array([ 70, int(0.80 * 255), int(0.50 * 255)])
upper = np.array([110, int(1.00 * 255), int(1.00 * 255)])
highlighted = cv2.inRange(cv2.merge([h_2, s, v]), lower, upper)

# Find contours w.r.t. the OpenCV version; retrieve bounding rectangles
cnts = cv2.findContours(highlighted, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]
rects = [cv2.boundingRect(cnt) for cnt in cnts]

# Iterate bounding boxes, and OCR
for x, y, w, h in rects:

    # Grayscale, and threshold using Otsu
    work = cv2.cvtColor(img[y:y+h, x:x+w], cv2.COLOR_BGR2GRAY)
    work = cv2.threshold(work, 0, 255, cv2.THRESH_OTSU)[1]

    # Pytesseract with -psm 6
    text = pytesseract.image_to_string(work, config='--psm 6')\
        .replace('\n', '').replace('\f', '')
    print('X: {}, Y: {}, Text: {}'.format(x, y, text))
    # X: 468, Y: 1574, Text: START MEDITATING
    # X: 332, Y: 1230, Text: Well done. By signing up, you’ve taken your first
    # X: 358, Y: 182, Text: Welcome

Caveat: I use a special version of Tesseract from the Mannheim University Library.

----------------------------------------
System information
----------------------------------------
Platform:      Windows-10-10.0.19041-SP0
Python:        3.9.1
PyCharm:       2021.1.1
NumPy:         1.20.3
OpenCV:        4.5.2
pytesseract:   5.0.0-alpha.20201127
----------------------------------------
HansHirse
  • 18,010
  • 10
  • 38
  • 67
  • Thank you, but I do not have highlighted area, just borders – Bro From Space May 27 '21 at 10:54
  • I know. The proposed approach finds those borders, and gets the area within to do the OCR there. I have to ask: Did you read and understand the presented code? If the results (last three lines, comments) are not what you expect, then you should make clear in your question, what your actual goal is. – HansHirse May 27 '21 at 11:02
  • Hi. Could you please a bit more information about this code ------------------h_2 = ((h.astype(int) + 90) % 180).astype(h.dtype) ------------------------work = cv2.cvtColor(img[y:y+h, x:x+w], cv2.COLOR_BGR2GRAY) work = cv2.threshold(work, 0, 255, cv2.THRESH_OTSU)[1] – Oleg May 30 '21 at 17:46
  • @Oleg 1) Please have a look at the linked Wikipedia article on the HSV color space. Red-ish colors can be found at hue values from 0° to maybe 20° and from maybe 340° to 360°. So, you'd need two sets of boundaries to detect red-ish colors, thus two `cv2.inRange` calls. To simplify that, I shift all hue values by 90°, such that red-ish colors can be found from 70° to 110°. 2) That's slicing (cropping to ROI), and color conversion to grayscale, needed for the following thresholding. 3) That's thresholding using [Otsu's method](https://en.wikipedia.org/wiki/Otsu%27s_method). – HansHirse May 31 '21 at 10:12
0

From the top to bottom. The boxes are approximately at (x1, y1, x2, y2)

  • 0.2564, 0.1070, 0.6293, 0.166
  • 0.2377, 0.6826, 0.7645, 0.703
  • 0.331, 0.88, 0.6713, 0.913

In relative to width and height. The full code would be like

import cv2
import pytesseract

image = cv2.imread('E5PY2.jpg')
coords = [[0.2564, 0.1070, 0.6293, 0.166],
          [0.2377, 0.6826, 0.7645, 0.703],
          [0.331, 0.88, 0.6713, 0.913]]
h, w, c = image.shape
for idx, (x1, y1, x2, y2) in enumerate(coords):
    x1 = int(x1 * w)
    x2 = int(x2 * w)
    y1 = int(y1 * h)
    y2 = int(y2 * h)
    print(pytesseract.image_to_string(image[y1:y2, x1:x2]))
Natthaphon Hongcharoen
  • 2,244
  • 1
  • 9
  • 23