0

I'm just trying to make a simple program to OCR a entire page, however I am getting a encode error, which I've always have had trouble with fixing.

My code:

from PIL import Image
import pytesseract

text = pytesseract.image_to_string(Image.open('005.png'))
print(text)

My error:

File "c:/Users/Dylan C/Desktop/Comparitor/image.py", line 4, in print(text)

File "C:\Users\Dylan C\AppData\Local\Programs\Python\Python35\lib\encodings\cp437.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_map)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\u2019' in position 187: character maps to

Sorry if this is a stupid question I have JUST downloaded tesseract, and am no expert in programming.

IIlIll IIIlII
  • 51
  • 1
  • 6
  • Possible duplicate of [Python, Unicode, and the Windows console](https://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console) – snakecharmerb Mar 01 '19 at 18:55

1 Answers1

0

As error states: problem is in print(text) - you try to print unicode (utf-8) text to console/environment that does not support it.
Search for print UnicodeEncodeError windows solution e.g. Python, Unicode, and the Windows console

user898678
  • 2,994
  • 2
  • 18
  • 17