I am using Python 2.7 on Windows 10 and am working with Korean text. My ultimate goal is to be able to import some Korean text, modify it and then write the new text to a file.
However, any Korean text I attempt to print to the terminal or write to a file ends up as a series of question marks.
For example, if I do the following
>>>print u'가다'
I get
??
I have tried printing as both '가다' and u'가다'. I have also tried two different encodings using sys.setdefaultencoding(ENCODING NAME). The encodings I have tried are "utf-8" and "iso 8859-15".
I tried print u'가다'.encode('utf-8') and print '가다'.encode('utf-8')
I tried seeing at what point the information is being lost by using ord and get the following.
>>> ord(u'가')
63
ord('가') and ord(u'가') both return 63, which is the same as ord('?'), so it seems whatever the problem is it's happening the moment I hit the enter button. The same happens if I save '가' or u'가' to a variable and get the ord of that variable.
I have no problem getting korean text to work in python 3, but I am using a korean language processing library that doesn't work in python 3 so switching to python 3 isn't an option for this situation. Any help would be much appreciated. Thank you in advance.