I have some text encoded in UTF-8. 'Before – after.' It was fetched from the web. The '–' character is the issue. If you try to print directly from the command line, using copy and paste:
>>> text = 'Before – after.'
>>> print text
Before – after.
But if you save to a text file and try to print:
>>> for line in ('file.txt','r'):
>>> print line
Before û after.
Im pretty sure this is some sort of UTF-8 encode/decode error, but it is eluding me. I have tried to decode, or re-encode but that is not it either.
>>> for line in ('file.txt','r'):
>>> print line.decode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 7: invalid start byte
>>> for line in ('file.txt','r'):
>>> print line.encode('utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 7: invalid start byte