0

I am using Python 3.4 on Windows 10. I am trying to deal with filenames that use extended character sets. I am having an issue with both Korean and Japanese characters. I was having problems with opening the files when I manually entered the names. When I made python get the names through listdir:

print(os.listdir(dirname)[0])

This raises the exception:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 7-    10: character maps to <undefined>

Those characters in positions 7-10 are all in Hangul (Korean). Strangely, this works if I run it in IDLE and using sys.getfilesystemencoding() can't find what encoding in being used. I have tried all sorts of encoding options and none seem to help. I assume this is some sort of issue with Microsoft still deciding to use MBCS for the console?

Edit:

Interesting, if I do:

print("日本語")

It doesn't crash but just outputs:

???

However, if I make a file with the same name and do:

print(os.listdir(path)[0])

Where this file is still called "日本語" it will crash with:

UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to <undefined>

I am running cmd as my console.

Edit: I just realised that as long as I don't output it to the console, Windows is able to open the file if I don't mess with the encoding (my problem before). I'll just make sure I use QT whenever I want to print a result.

jbc9
  • 17
  • 3
  • I've closed your post as a duplicate. Have a look at that post, especially to J.F. Sebastian's answer. – Martijn Pieters Sep 04 '15 at 18:23
  • That didn't seem to help. I'm still getting the same error. – jbc9 Sep 04 '15 at 18:30
  • So you are using `win_unicode_console` and have enabled it before printing and you still get that same error? Could you update your question with a [mcve] that illustrates the issue? Do tell us what you use as a console (`cmd.exe` presumably?). – Martijn Pieters Sep 04 '15 at 18:31

0 Answers0