All my python source code is encoded in utf-8 and has this coding declared on the top of the file.
But sometimes the u
before a unicode string is missing.
Example Umlauts = "üöä"
Above is a bytestring containing non-ascii characters and this makes trouble (UnicodeDecodeError).
I tried pylint and python -3
but I could not get a warning.
I search an automated way to find non-ascii characters in bytestrings.
My source code needs to support Python 2.6 and Python 2.7.
I get this well known error:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)
BTW: This question is only about python source code, not about strings read from files or sockets.
Solution
- for projects which need to support Python 2.6+ I will use
__future__.unicode_literals
- for projects which need to support 2.5 I will use the solution from thg435 (module ast)