4

Having a bizarre issue running this UTF-8 encoded script in Python 3.2. Python refuses to run if it contains the Japanese hiragana character の (see stack trace below)

Traceback (most recent call last):
  File "MyScript.py", line 20, in <module>
    print(no)
  File "C:\Python32\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u306e'
                    in position 0: character maps to <undefined>

It runs fine without this one single character (there are other characters in the file as well), and I'm at a loss to explain why. Any help would be appreciated.

Here is a script that reproduces the error for me:

#!/usr/bin/env python
# coding=utf-8

import glob
import codecs
import os.path
from datetime import datetime, timedelta

assTemplate = \
r"""タイトル\N {time.year}年{time.month}月{time.day}日 {age}\N{place}"""

for mtsName in glob.glob('./*.MTS'):
    baseName = mtsName.lower().replace('.mts', '')
    mtsName = os.path.abspath(mtsName)

    # Get the time the video file was created.
    mtsTimestamp = datetime.fromtimestamp(os.stat(mtsName).st_ctime)

    no = '\u306e'  
    print(no)       ## UnicodeDecodeError
    age = '生後'
    place = '自宅'
    print('自宅')

    # Generate the contents of the ASS file.
    assContents = assTemplate.format(time=mtsTimestamp, age=age, place=place)

    # Write the ASS file.
    print(assContents)

The reason for using Python 3.2 this was that string formatting with unicode strings was not working at all for me in Python 2.7.2.

Makoto
  • 104,088
  • 27
  • 192
  • 230
dythim
  • 920
  • 7
  • 24
  • You haven't specified the output encoding, so you have no idea whether the output can handle Unicode or not. I suggest setting the output encoding to UTF-8. – tchrist Aug 21 '11 at 14:21
  • I'm not sure I quite understand your suggestion. Where should I specify the output encoding? I came across some suggestions to change environment variables to change the behavior of stdout, is that what you mean? – dythim Aug 21 '11 at 17:36
  • That's what I do but I may have the wrong cultural backround. For predictability/reliability, I always arrange for my programs to use UTF-8 for input and output--unless and until I tell them otherwise. I don't like them doing different things run from different terminals or on different platforms, so standardize on UTF-8. I can only tell you what I do for how I work; this might be wrongheaded for other people. I don't know. I only use Unix and Macs though. **I've heard the default Microsoft terminal program *still* can't do UTF-8 in 2011,** so you may need to run `putty` locally or something. – tchrist Aug 21 '11 at 18:58
  • Well, the python debugger I use does show UTF-8, and the final destination is a file, not terminal. Can you tell me what you actually do? I'd very much like to get this script working. It's something for my wife's parents. – dythim Aug 21 '11 at 19:16
  • I just set PYTHONIOENCODING to utf8 and it all works. I am using my folks Microsoft machine right now, and I find that although the Cygwin shell is being a pain in the butt, I can see the characters fine running putty with the encoding set to UTF-8. – tchrist Aug 21 '11 at 19:31
  • related: [Python, Unicode, and the Windows console](http://stackoverflow.com/questions/5419/python-unicode-and-the-windows-console) – jfs Jan 21 '12 at 06:03

2 Answers2

2

You are trying to print a unicode character to a terminal that uses cp1252. cp1525 does not support any Japanese characters at all. It is hence not a problem with that character, I bet you get the exact same error with any Japanese character.

Lennart Regebro
  • 167,292
  • 41
  • 224
  • 251
  • Does anybody know a good way to get Python to output non ASCII characters to a windows console? – David Heffernan Aug 20 '11 at 19:00
  • @David: non-ASCII is trivial. Non-cp1252, nope I have no idea. Maybe you can change the code page of the console? I think you could in Windows 95. :-) – Lennart Regebro Aug 20 '11 at 20:09
  • Even non ASCII is non trivial in my experience. I really can't believe how badly Python handles the windows console. – David Heffernan Aug 20 '11 at 20:16
  • See this SO question - http://stackoverflow.com/questions/3780378/how-to-display-japanese-kanji-inside-a-cmd-window-under-windows. But I never got this to work on Windows (for any program); 1 of the reasons I now use a Mac – PandaWood Aug 20 '11 at 22:50
  • @David: Everything in cp1252 should work without a problem, at least unless you change the font to something that can't handle those characters. Python really doesn't have much options in "handling" a console. If this doesn't work, it's the console that sucks. – Lennart Regebro Aug 21 '11 at 02:16
  • @Lennart: No, as I said in the original post it is only that character. The other characters in the script work fine. – dythim Aug 21 '11 at 03:44
  • @dythim: I notice you aren't printing any of the other characters in the script. And I also notice your console is using cp1252, which supports only Latin-1 + some other stuff. So, sorry, I don't believe you. `print('宅')` must give the exact same error. If it doesn't, your description of the problem is incorrect. – Lennart Regebro Aug 21 '11 at 08:09
  • @Lennart: I'm sorry you don't believe me. I took the first print() line out and re-ran the script. I was not mistaken. I get the output on the console exactly as I would expect. I would appreciate your help determining the actual problem if my description is in fact incorrect. So far the only symptom I have is what I have posted. – dythim Aug 21 '11 at 16:00
  • @David: unfortunately the Windows console's support for Unicode is not great. Programs such as Python and most other languages that use the standard C I/O library don't get to write any characters to the console that don't fit in the default code page, which varies from locale to locale but is never anything useful like UTF-8. You might try an alternative console such as IDLE, PyDev etc. – bobince Aug 21 '11 at 16:43
  • @bobince The windows console has supported Unicode for years and years. Python is char* oriented and thus does not like UTF-16. Windows console isn't going to change and Python devs have made it clear they have not been interested in bringing Python to Windows. That disdain appears to have softened of late but it's a sorry state of affairs. In an ideal world Windows would use UTF-8, but it doesn't. – David Heffernan Aug 21 '11 at 16:49
  • @dythim: I would have to debug it, but I don't have a modern Windows here so that will have to wait until in two weeks. It just doesn't make any sense that Python would suddenly switch to cp1252 for one specific character. – Lennart Regebro Aug 21 '11 at 21:36
  • 1
    @Lennart: It seems you are correct in that printing Japanese should not have worked, although it did for most, including a previous project. I won't worry too much as to the reasons why. As long as I don't print to the console, nothing goes wrong and I get my output files just fine. – dythim Aug 22 '11 at 18:12
-1

i had this problem ,too. My language is Vietnamese. You can cut the file cp1252.py or delete this file. You should cut this file and move another folder , any folders you like. Now, in encodings folder don't have file cp1252.py , don't worry. Next you copy file utf-8 in encodings folder and paste this file to encodings folder and rename this file is cp1252.py . Do you understand.

I fixed this problem like that.

Success for you!

My yahoo nickname is: phong_ux . If you need more help , i am willing to help you.

  • 2
    That sounds like a *really* broken solution. Sounds like you've in effect redefined CP1252. – derobert Jan 21 '12 at 12:21
  • 1
    That does not help. He gets the error because he prints a character to the console that the console does not support. Your solution does not make the console support it, and it still does not print the character. You may have removed the error, but it is still broken. Also, you broke the cp1252 codepage, so now you can't use it at all. – Lennart Regebro Jan 21 '12 at 19:31