I created a file with just an em dash in it in Notepad and saved this file with Unicode (big endian)
encoding. In Notepad, this displays an em dash. When I open the file and read it like this in Python 3/IDLE:
open(file_path, encoding="UTF-16-BE").read()
I get this:
'\ufeff—'
Expressed as bytes, the files contents are this:
b'\xfe\xff \x14'
Shouldn't it be handling the BOM and not displaying it? I looked at the available encodings for Python and there was nothing like a UTF_16_BE_SIG
in there as there is for UTF_8_SIG
. What is going on here and how do I handle it properly?