0

We are writing a complex website with i18n. To make translation easier we hold the translations in models. Our staff writes and edits the translations via django-admin. When the translation is completed a management script is started which writes the po-files and executes afterwards djangos compilemessages for all of them. I know, the po-files have to be writen using utf-8. But after opening the app I still get the error "'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)" when using languages with special characters like spanish or frensh. What am I doing wrong?

Here is my (shortened) code:

class Command(NoArgsCommand):

def handle_noargs(self, **options):

    languages = XLanguage.objects.all()
    currPath = os.getcwd()

    for lang in languages:

        path = "{}/framework/locale/{}/LC_MESSAGES/".format(currPath, lang.langToplevel)

        # check and create path
        create_path(path)

        # add filename
        path = path + "django.po"

        with codecs.open(path, "w", encoding='utf-8') as file:

            # select all textitems for this language from XTranslation

            translation = XTranslation.objects.filter(langID=lang)

            for item in translation:

                    # check if menu-item
                    if item.textID.templateID:
                        msgid = u"menu_{}_label".format(item.textID.templateID.id)
                    else:
                        msgid = u"{}".format (item.textID.text_id)

                    trans = u"{}".format (item.textTranslate)

                    text = u'msgid "{}"      msgstr "{}"\n'.format(msgid, trans)

                file.write(text)


        file.close()

Traceback:

Environment:

Request Method: GET
Request URL: http://127.0.0.1:8000/

Django Version: 1.7
Python Version: 3.4.0
Installed Applications:
('django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'simple_history',
'datetimewidget',
'payroll',
'framework',
'portal',
'pool',
'billing')
Installed Middleware:
('django.contrib.sessions.middleware.SessionMiddleware',
'django.middleware.common.CommonMiddleware',
'django.middleware.csrf.CsrfViewMiddleware',
'django.contrib.auth.middleware.AuthenticationMiddleware',
'django.contrib.auth.middleware.SessionAuthenticationMiddleware',
'django.contrib.messages.middleware.MessageMiddleware',
'django.middleware.clickjacking.XFrameOptionsMiddleware',
'simple_history.middleware.HistoryRequestMiddleware')


Traceback:
File "c:\python34\lib\site-packages\django\core\handlers\base.py" in get_response
  111. response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "E:\python\sarlex\framework\views.py" in init
   34. activate("de")
File "c:\python34\lib\site-packages\django\utils\translation\__init__.py" in activate
  145. return _trans.activate(language)
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in activate
  225. _active.value = translation(language)
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in translation
  210. current_translation = _fetch(language, fallback=default_translation)
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in _fetch
  195. res = _merge(apppath)
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in _merge
  177. t = _translation(path)
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in _translation
  159. t = gettext_module.translation('django', path, [loc], DjangoTranslation)
File "c:\python34\lib\gettext.py" in translation
  410. t = _translations.setdefault(key, class_(fp))
File "c:\python34\lib\site-packages\django\utils\translation\trans_real.py" in __init__
  107. gettext_module.GNUTranslations.__init__(self, *args, **kw)
File "c:\python34\lib\gettext.py" in __init__
  160. self._parse(fp)
File "c:\python34\lib\gettext.py" in _parse
  300. catalog[str(msg, charset)] = str(tmsg, charset)

Exception Type: UnicodeDecodeError at /
Exception Value: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)
sascha2014
  • 103
  • 1
  • 12
  • For the first import at the very top of the file, try `from __future__ import unicode_literals`. Either that, or prefix the strings with `u"menu_{}_label"`. I'm not quite sure if you are opening them programatically or not. There's a possibility that since you are passing in strings, and not unicode, it's throwing something off. If this happens to work, I'll submit this comment as an answer. – Michael B Nov 28 '14 at 07:46

3 Answers3

0

Whenever you have an encoding/decoding error, it means you are handling Unicode incorrectly. This is most often when you mix Unicode with byte strings, which will prompt Python 2.x to implicitly decode your byte strings to Unicode with the default encoding, 'ascii', which is why you get errors like these:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1: ordinal not in range(128)

The best way to avoid these errors is to work with ONLY Unicode within your program, i.e. you have to explicitly decode all input byte strings to Unicode with 'utf-8' (or another Unicode encoding of your choice), and mark the strings in your code as type Unicode with the prefix u''. When you write out to file, explicitly, encode these back to byte string with 'utf-8'.

Specifically to your code, my guess is either

msgid = "menu_{}_label".format(item.textID.templateID.id)

or

text = 'msgid "{}"      msgstr "{}"\n'.format(msgid, item.textTranslate)

is throwing the error. Try making msgid and text Unicode strings instead of byte strings by declaring them like so:

msgid = u"menu_{}_label".format(item.textID.templateID.id)

and

text = u'msgid "{}"      msgstr "{}"\n'.format(msgid, item.textTranslate)

I'm assuming that the values of item.textID.templateID.id and item.textTranslate are both in Unicode. If they aren't (i.e. they are byte strings), you'd have to decode them first.

Lastly, this is a very good presentation on how to handle Unicode in Python: http://nedbatchelder.com/text/unipain.html. I highly recommend you go thru it if you do a lot of i18n work.

EDIT 1: since item.textID.templateID.id and item.textTranslate are byte strings, your code should be:

for item in translation:
    # check if menu-item
    if item.textID.templateID:
        msgid = u"menu_{}_label".format(item.textID.templateID.id.decode('utf-8'))
    else:
        msgid = item.textID.text_id.decode('utf-8')  # you don't need to do u"{}".format() here since there's only one replacement field

    trans = item.textTranslate.decode('utf-8')  # same here, no need for u"{}".format()
    text = u'msgid "{}"      msgstr "{}"\n'.format(msgid, trans)  # msgid and trans should both be Unicode at this point
    file.write(text)

EDIT 2: Original code was in Python 3.x, so all of the above is NOT applicable.

oxymor0n
  • 1,089
  • 7
  • 15
  • thanks for your answers and particulary oxymorOn for the link to the presentation. I added u as you have recommended. I also checked item.textID.templateID.id and item.textTranslate with type () and they both return str. I also checked "text" before writing to file and it's str to, but I still get the same error – sascha2014 Nov 28 '14 at 09:28
  • the type of item.textID.templateID.id, item.textTranslate, and text should all be Unicode, NOT str. If they are str type, decode your strings with 'utf-8' first. Let me know if they still throws exceptions. – oxymor0n Nov 28 '14 at 10:19
  • sorry to bother again, but I can not progress. I changed my code (see edited code above) But it still doesn't work :-( I guess I am still doing something wrong, 'cause I am new to python – sascha2014 Nov 28 '14 at 11:50
  • I am using python 3.4.2. and decode is gone, because unicode is default for strings (why I don't understand the problem I am having) – sascha2014 Nov 28 '14 at 21:36
  • oh my, you should have said that you are using 3.x from the start. I was assuming that you use 2.x. Could you give us the whole traceback of the error? – oxymor0n Nov 29 '14 at 03:19
  • sorry, you are totally right. I have forgotton to mention it at beginning. It woun't happen again. I put the traceback above. – sascha2014 Nov 29 '14 at 07:05
  • @sascha2014 well ok so if you are using 3.x then the `u""` literals are not needed, you can remove them from your code. You also don't need to decode the variables since in Python 3.x the `str` type IS Unicode. Now, are you sure that traceback is generated from the snippet of code that you showed us? Because I see nothing in that traceback that reference your code. – oxymor0n Nov 29 '14 at 07:49
  • Yes, this error shows up when I start my app and spanish or frensh is activated in the project settings (LANGUAGE_CODE = 'es') or with activate ("es") in my views.py. – sascha2014 Nov 29 '14 at 08:27
  • is this code snippet you provided involved in the `activate()` function? I dont see it being called in the traceback, which means whatever throwing your exception might be elsewhere – oxymor0n Nov 29 '14 at 18:06
  • You can find the activate() function in traceback: line 34: activate ("de"). – sascha2014 Nov 29 '14 at 21:32
  • Maybe it may help: I created a complete new project and app, started syncdb. Everything works fine. Than I copied the folder "locale" with the language files into the new app, set LANGUAGE_CODE = "de" in settings.py and called syncdb again. I get the error again. Seams the problem is not inside my program, it's in the po-file. I opened the po-file with BabelPad (UTF-Editor) and it says it's utf-8 LF – sascha2014 Nov 30 '14 at 07:36
  • I found the solucion and posted the answer. Thank you so much for your support. – sascha2014 Dec 01 '14 at 09:04
0

I had the same error and this helped me https://stackoverflow.com/a/23278373/2571607

Basically, for me, it's an issue with python. My solution is, open C:\Python27\Lib\mimetypes.py

replace

‘default_encoding = sys.getdefaultencoding()’

with

if sys.getdefaultencoding() != 'gbk':  
    reload(sys)  
    sys.setdefaultencoding('gbk')  
default_encoding = sys.getdefaultencoding() 
Community
  • 1
  • 1
Yue Y
  • 583
  • 1
  • 6
  • 24
0

Soluction found! I was writing msgid and msgstr in one line separated with space to make it more readable. This works in english but throws an error in languages with special characters like spanish or frensh. After writing msgid and msgstr in 2 lines it works.

sascha2014
  • 103
  • 1
  • 12