1

Related post: UnicodeDecodeError: 'charmap' codec can't decode byte X in position Y: character maps to <undefined>

Hi all, this is my first post and I am sooo sorry if I mess something up and this is a spammy or bad post, but I haven't found any solution to my troubles!

file=open("bookmarks_3_6_17.html",'r')
content=file.read()
print(content)

And this is what I get in return:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Users\Dan\Anaconda3\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 111530: character maps to <undefined>

I did some looking and I saw that maybe the file was encoded in UTF-8, which is what Python assumes by default. I looked around and found nothing to help this error go away. I even tried changing the encoding and using the "encoding" method for "open()" but I got the same error, just in a different position and involving a different encoding file.

Does anyone know how I can fix this? Please let me know if there is any further info I should provide!

Community
  • 1
  • 1
  • How are you trying to set the encoding? Python does **not** open files in UTF-8 by default. It uses your system locale instead, see the [`open()` function documentation](https://docs.python.org/3/library/functions.html#open), read everything you find mentioning *encoding*. For your Windows system, the default is Windows Codepage 1252 (basically Latin-1, but not quite). – Martijn Pieters Mar 06 '17 at 14:09
  • better use with open instead of file=open; or don't forget to close your file; the issue you have is that contents are unicode, search for that. – Drako Mar 06 '17 at 14:10
  • Hi Martijn, thanks for clarifying about the encoding, I will dig into that. I read somewhere that Python opens files in UTF-8 by default--guess that was wrong, more confusion for a newbie! I have set the html file to be UTF-8 and Python to read that encoding too, but still I have a problem. I will dig around the source you provided. Drako, why would it be better to use with open instead of file=open? I'm a serious newbie. – DangerRangerDan Mar 07 '17 at 09:51
  • I have tried changing the encoding of the html file itself and specifying its encoding when opening it, but all I get is the exact same error, just in a different place and with a different character. I'm at a total loss. – DangerRangerDan Mar 07 '17 at 11:07

0 Answers0