-1
pythonNotes = open('E:\\Python Notes.docx','r')
read_it_now = pythonNotes.read()
print(read_it_now.encode('utf-16'))

When I try this code, I get:

UnicodeDecodeError: 'charmap' can't decode byte 0x8f in position 591 character maps to <undefined>

I am running this in visual studio with python tools - starting without debugging.

I have tried putting enc='utf-8' at the top, throwing it in as a parameter, I've looked at other questions and just couldn't find a solution to this simple issue.

Please assist.

Božo Stojković
  • 2,893
  • 1
  • 27
  • 52
onion
  • 3
  • 2
  • Did you try `open('E:\\Python Notes.docx','r', encoding='utf-8')` (note `encoding`, not `enc`)? Also, which version of Python are you using? – Aurora0001 Jul 31 '16 at 19:21
  • 4
    A .docx file is a binary file, so you aren't going to be able to print anything coherent without more work. You could open it in binary mode (`'rb'`) and use the `zipfile` module to extract the XML data inside. – Mark Tolonen Jul 31 '16 at 19:29
  • hey aurora that enconding='utf-8' does not work, and mark, would that be rb instead of r? i think i found some site that showed how to do the XML, i can try that – onion Jul 31 '16 at 19:37

1 Answers1

0

This error can occur when text that is already in utf-8 format is read in as an 8-bit encoding, and python tries to "decode" it to Unicode: Bytes that have no meaning in the supposed encoding throw a UnicodeDecodeError. But you'll always get an error if you try to read a file as utf-8 that is not in the utf-8 encoding.

In your case, the problem is that a docx file is not a regular text file; no single text encoding can meaningfully import it. See this SO answer for directions on how to read it on a low level, or use python-docx to get access to the document in a way that resembles what you see in Word.

Community
  • 1
  • 1
alexis
  • 48,685
  • 16
  • 101
  • 161