When writing one-off scripts for file management, I am often using the print
function to verify if e.g. the list of files I am operating on is what I intend it to be. Consider e.g.
for path in glob.glob("*.mp3"):
print(path)
On Windows (including Cygwin when LANG=C
is used, so this may apply to Unix too) this will raise an UnicodeEncodeError
, if the file names include unicode characters.
In Cygwin, when using print(repr(path))
instead, unsupported characters are escaped as \uxxxx
. In the Windows console however, it will still raise an UnicodeEncodeError
.
Partial Solution
The closest things to solutions I found were
unicodestring = "Hello \u2329\u3328\u3281\u1219 World"
print(repr(unicodestring).encode("utf8").decode(sys.stdout.encoding))
# Breaks even supported characters
print(unicodestring.encode("unicode-escape").decode("ascii"))
both of which are rather verbose for "quick and dirty" scripts, especially when the print call consists of multiple strings with possible non-ascii content.
Similar questions exist for Python 2, but the solutions don't usually work with Python 3. Also, they are often equally verbose.