I'm having troubles with character encoding when writing to files with a script. What I'm doing is downloading some information from a website with an API. I have no control over what format I receive the information in, but here's a quick sample:
{'id': 12, 'name': "Kathy \xc3\x93 Fakename"}
{'id': 23, 'name': "Se\xc3\xb1or Murphy"}
(the names there are "Kathy Ó Fakename" and "Señor Example")
This is mostly fine, when I write these to a generic file with no filetype I get them in the proper format with the correct characters.
However I have 2 problems. I'm writing all this information into a html table. When I'm writing to a file with .html
as it's ending, the wrong characters are written to the file. Instead I end up getting the names Kathy Ó Fakename
and Señor Example
. These incorrect characters are also what show up as the actual filename, even though the corrects I want to be there are perfectly valid for filenames.
I believe I verified that the only difference is the filetype, though I am still confused since I didn't expect Python to implicitly adjust what I wrote. Also it definitely is in the source of the HTML, not just how it displays.
To demonstrate, this code:
with open(os.path.abspath("Test.html"),'w') as f:
for user in users:
f.write("{}: {}<br>".format(user['id'], user['name']))
with open(os.path.abspath("Test"),'w') as f:
for user in users:
f.write("{}: {}\n".format(user['id'], user['name']))
Results in
Test
12: Kathy Ó Fakename
23: Señor Murphy
Test.html
12: Kathy Ó Fakename<br>
23: Señor Murphy<br>
What's causing the difference here?