I am using Python 3.4 on Windows 10. I am trying to deal with filenames that use extended character sets. I am having an issue with both Korean and Japanese characters. I was having problems with opening the files when I manually entered the names. When I made python get the names through listdir:
print(os.listdir(dirname)[0])
This raises the exception:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 7- 10: character maps to <undefined>
Those characters in positions 7-10 are all in Hangul (Korean). Strangely, this works if I run it in IDLE and using sys.getfilesystemencoding()
can't find what encoding in being used. I have tried all sorts of encoding options and none seem to help. I assume this is some sort of issue with Microsoft still deciding to use MBCS for the console?
Edit:
Interesting, if I do:
print("日本語")
It doesn't crash but just outputs:
???
However, if I make a file with the same name and do:
print(os.listdir(path)[0])
Where this file is still called "日本語" it will crash with:
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to <undefined>
I am running cmd as my console.
Edit: I just realised that as long as I don't output it to the console, Windows is able to open the file if I don't mess with the encoding (my problem before). I'll just make sure I use QT whenever I want to print a result.