4

Python code that runs in Development/local machine, but fails after installing to Appengine :

1st line in my File :

# -*- coding: utf8 -*-O

Lines later in the code :

s1 = u'Ismerőseid'
logging.info (s1)
s2 = s1 + u':' + s1
logging.info (s2)
logging.info ("%s,%s", s1, s2)

In Dev (localhost):

INFO     2012-12-18 04:01:17,926 AppRun.py:662] Ismerőseid,
INFO     2012-12-18 04:01:17,926 AppRun.py:664] Ismerőseid:Ismerőseid
INFO     2012-12-18 04:01:17,926 AppRun.py:665] Ismerőseid,Ismerőseid. Ó,

On App Engine after install/run :

I 2012-12-21 06:52:07.730 
É, Á, Ö, Ü. Ó,

E 2012-12-21 06:52:07.736

Traceback (most recent call last):
  File "....", line 672, in xxxx
    s3 = s1 + u':' + s1
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 5: ordinal not in range(128)

I have tried to various combination of encoding/decoding/etc.. I have also chardet on the pasted string 'Ismerőseid' and it gives me {'confidence': 0.7402600692642154, 'encoding': 'ISO-8859-2'}

Any help is greatly appreciated!

user1055761
  • 1,071
  • 1
  • 12
  • 28
  • `s1` is not unicode on the GAE then; decode it to unicode first, using the correct encoding. – Martijn Pieters Dec 21 '12 at 12:16
  • If you use non-ASCII characters in your source code, you must specify the file's encoding on the second line of the file (See http://stackoverflow.com/questions/728891/correct-way-to-define-python-source-code-encoding ). (Your local machine has it's locale set in a way that makes it assume UTF-8 instead of ASCII, which is why it works there, but you should never rely on this behavior) – Wooble Dec 21 '12 at 13:45
  • I have a # -*- coding: utf8 -*-O in my file. And I am assuming that this file is run thru the App Engine in the same way as it is run when I run it on my local box (thru eclipse) thru the ${GOOGLE_APP_ENGINE}/dev_appserver.py. Thanks for your help in any case ! – user1055761 Dec 22 '12 at 14:32

1 Answers1

6

Put these 3 lines on the top of your Python 27 code to use unicode :

#!/usr/bin/python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals

# And this code will not give you any problems

s1 = 'É, Á, Ö, Ü. Ó,'
logging.info (s1)
s2 = s1 + ':' + s1
logging.info ("%s,%s", s1, s2)

And never user str(). Only if you realy need to!

And read this blogpost from Nick Johnson. This was before Python 27. He did not use the from __future__ import unicode_literals , which makes using unicode with Python so easy.

voscausa
  • 11,253
  • 2
  • 39
  • 67
  • I don't have from __future... in there, just the coding: utf-8. Let me add that in and see if it makes a difference. Thanks ! – user1055761 Dec 22 '12 at 14:24
  • This did not solve it either. I continue to get the error. Any time the character set is beyond 255 (extended ascii) string operations seem to be failing in GAE. The python operations try to encode to ascii and fail.. Do you have a working a example on GAE for this ? – user1055761 Dec 23 '12 at 02:43
  • What operations? And have a look at this article from Nick Johson: http://blog.notdot.net/2010/07/Getting-unicode-right-in-Python – voscausa Dec 23 '12 at 02:52
  • the post from Nick helped.. My issue was that all my multi-language literals were stored in a separate auto-generated file. The strings were not prefixed by 'u' so were str's and all string operations were failing since they had code-points greater than 255. Python or eclipse should complain when you define such sring literals.. Your pointers help solve the problem and enabled me to establish a better understanding of how Python handles unicode literals. Thanks !! – user1055761 Dec 23 '12 at 19:23
  • Can you do a simple edit on your answer, so that I can vote it up please ! – user1055761 Dec 23 '12 at 19:24