2

I want my program to read /, _, and \ from an image but sometimes it reads / as I and /_\ as A. I am using the pytesseract library to do this. Is there a way to specifically read characters like /_ and \?

nathancy
  • 42,661
  • 14
  • 115
  • 137
senear
  • 31
  • 2
  • Please make sure to understand you should always try to produce a minimal reproducible example. More info here: https://stackoverflow.com/help/minimal-reproducible-example – Celius Stingher Sep 05 '19 at 01:42

1 Answers1

0

You can use pytesseract.image_to_string to read text from an image. Depending on your image, you may want to perform preprocessing before throwing it into Pytesseract. This can be a combination of thresholding, blurring, or smoothing techniques using morphological operations. Using this example image,

enter image description here

Here's the result printed to the console

enter image description here

We use the --psm 6 config flag since we want to treat the image as a single uniform block of text. Here's some additional configuration flags that could be useful

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.png',0)
data = pytesseract.image_to_string(image, lang='eng',config='--psm 6')
print('Result:', data)
nathancy
  • 42,661
  • 14
  • 115
  • 137