The reason is that the .xlsx
file has a different encoding, not utf-8
. You'll need to use the original encoding to decode the file.
There's no guaranteed way of finding out the encoding of a file programmatically. I'm guessing you're making this application for general users and so you will keep encountering files with different and unexpected encodings.
A good way to deal with this is by trying to decode using multiple encodings, in case one fails. Example:
encodings = ['utf-8', 'iso-8859-1', 'windows-1251', 'windows-1252']
for encoding in encodings:
try:
decoded_file = self.request.files['1'][0]['body'].decode(encoding)
except UnicodeDecodeError:
# this will run when the current encoding fails
# just ignore the error and try the next one
pass
else:
# this will run when an encoding passes
# break the loop
# it is also a good idea to re-encode the
# decoded files to utf-8 for your purpose
decoded_file = decoded_file.encode("utf8")
break
else:
# this will run when the for loop ends
# without successfully decoding the file
# now you can return an error message
# to the user asking them to change
# the file encoding and re upload
self.write("Error: Unidentified file encoding. Re-upload with UTF-8 encoding")
return
# when the program reaches here, it means
# you have successfully decoded the file
# and you can access it from `decoded_file` variable
Here's a list of some common encodings: What is the most common encoding of each language?