How to read punctuation characters like '/', '_' and '\' from an image

Question

I want my program to read /, _, and \ from an image but sometimes it reads / as I and /_\ as A. I am using the pytesseract library to do this. Is there a way to specifically read characters like /_ and \?

Please make sure to understand you should always try to produce a minimal reproducible example. More info here: https://stackoverflow.com/help/minimal-reproducible-example — Celius Stingher, Sep 05 '19 at 01:42

nathancy · Answer 1 · 2019-09-05T01:48:44.027

You can use pytesseract.image_to_string to read text from an image. Depending on your image, you may want to perform preprocessing before throwing it into Pytesseract. This can be a combination of thresholding, blurring, or smoothing techniques using morphological operations. Using this example image,

Here's the result printed to the console

We use the --psm 6 config flag since we want to treat the image as a single uniform block of text. Here's some additional configuration flags that could be useful

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.png',0)
data = pytesseract.image_to_string(image, lang='eng',config='--psm 6')
print('Result:', data)

How to read punctuation characters like '/', '_' and '\' from an image

1 Answers1