I want my program to read /
, _
, and \
from an image but sometimes it reads /
as I
and /_\
as A
. I am using the pytesseract library to do this.
Is there a way to specifically read characters like /_
and \
?
Asked
Active
Viewed 424 times
2
-
Please make sure to understand you should always try to produce a minimal reproducible example. More info here: https://stackoverflow.com/help/minimal-reproducible-example – Celius Stingher Sep 05 '19 at 01:42
1 Answers
0
You can use pytesseract.image_to_string
to read text from an image. Depending on your image, you may want to perform preprocessing before throwing it into Pytesseract. This can be a combination of thresholding, blurring, or smoothing techniques using morphological operations. Using this example image,
Here's the result printed to the console
We use the --psm 6
config flag since we want to treat the image as a single uniform block of text. Here's some additional configuration flags that could be useful
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('1.png',0)
data = pytesseract.image_to_string(image, lang='eng',config='--psm 6')
print('Result:', data)

nathancy
- 42,661
- 14
- 115
- 137