1

I am downloading data from a MySQL database. Some of the data is in Korean. When I try to print the string before putting it in a table (Qt), the windows command prompt returns:

File "C:\Python27\lib\encodings\cp437.py", line 12, in encode
    return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 0-2: character maps to (undefined)

However, when I use IDLE to run the code, it prints the Korean language fine. This caused me alot of headache when trying to debug why my program was not working as I just click the python file from the folder to run it. Finally when using idle it turned out everything works.

Is there something wrong with my python installation, windows installation, or python code trying to just print the characters? I assumed it wouldnt be the python code as it works in IDLE. Also, using a special function to print in windows seems bad as it limits the codes portability to another OS (or will every OS have this problem?)

user-2147482637
  • 2,115
  • 8
  • 35
  • 56
  • did you try using different codecs like utf-8 or utf-16 – Natecat May 14 '14 at 05:17
  • when connecting to the database i use charset='utf8', and when I put the string in my table it inserts fine, so I assume it is using utf-8, but how would I convert a string in python only when printing? – user-2147482637 May 14 '14 at 05:21
  • Unfortunately, UTF-8 is supported badly in windows. May be it will be useful: [http://stackoverflow.com/questions/388490/unicode-characters-in-windows-command-line-how] – NorthCat May 14 '14 at 08:24

1 Answers1

0

IDLE is based on tkinter, which is based on tcl/tk, which supports the entire Basic Multilingual Plane (BMP). (But tcl/tk does not support supplementary planes with other characters). On Windows, the Python interactive interpreter runs in the same console window used by Command Prompt. This only supports Code Page subsets of the BMP, sometimes only 256 of 2^^16 characters.

The codepage that supports ASCII and Korean is 949. (Easy Google search.) In Command Prompt, chcp 949 should change to that codepage. If you then start Python, you should be able to display Korean characters.

Terry Jan Reedy
  • 18,414
  • 3
  • 40
  • 52