Encoding error when printing tesseract output

Question

I'm just trying to make a simple program to OCR a entire page, however I am getting a encode error, which I've always have had trouble with fixing.

My code:

from PIL import Image
import pytesseract

text = pytesseract.image_to_string(Image.open('005.png'))
print(text)

My error:

File "c:/Users/Dylan C/Desktop/Comparitor/image.py", line 4, in print(text)

File "C:\Users\Dylan C\AppData\Local\Programs\Python\Python35\lib\encodings\cp437.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 187: character maps to

Sorry if this is a stupid question I have JUST downloaded tesseract, and am no expert in programming.

Possible duplicate of [Python, Unicode, and the Windows console](https://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console) — snakecharmerb, Mar 01 '19 at 18:55

score 0 · Answer 1 · answered Mar 01 '19 at 07:51

0

As error states: problem is in print(text) - you try to print unicode (utf-8) text to console/environment that does not support it.
Search for print UnicodeEncodeError windows solution e.g. Python, Unicode, and the Windows console

answered Mar 01 '19 at 07:51

user898678

2,994
2
18
17

Encoding error when printing tesseract output

1 Answers1