Can someone explain to me why python has this behaviour?
Let's me explain.
BACKGROUND
I have a python installation and I want to use some chars that aren't in the ASCII table.
So I change my python default enconding.
I save every string, into a file .py, in that way '_MAIL_TITLE_': u'Бронирование номеров',
Now, with a method that replaces my dictionary keys, I want to insert into an html template my strings in a dynamic way.
I place into html page's header:
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
...... <!-- Some Css's -->
</head>
Unfortunately, my html doc comes to me (after those replaces) with some wrong chars (unconverted? misconverted?)
So, I open a terminal and start to make some order:
1 - Python 2.4.6 (#1, Jan 27 2012, 15:41:03)
2 - [GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2
3 - Type "help", "copyright", "credits" or "license" for more information.
4 - >>> import sys
5 - >>> sys.getdefaultencoding()
6 - 'utf-8'
7 - >>> u'èéòç'
8 - u'\xe8\xe9\xf2\xe7'
9 - >>> u'èéòç'.encode('utf-8')
10 - '\xc3\xa8\xc3\xa9\xc3\xb2\xc3\xa7'
11 - >>> u'è'
12 - u'\xe8'
13 - >>> u'è'.encode()
14 - '\xc3\xa8'
QUESTION
Take a look at line [7-10].
Isn't that weird? Why if my (line 6) python has a defaultencoding utf-8
, does it convert that string (line7) in a different way than line 9 does?
Now, take a look at lines [11-14] and their output.
Now, i'm totally confused!
THE HINT
So, I've tried to change my terminal way of input files (previously ISO-8859-1, now utf-8) and something changed:
1 - Python 2.4.6 (#1, Jan 27 2012, 15:41:03)
2 - [GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2
3 - Type "help", "copyright", "credits" or "license" for more information.
4 - >>> import sys
5 - >>> sys.getdefaultencoding()
6 - 'utf-8'
7 - >>> u'èéòç'
8 - u'\xc3\xc3\xa8\xc3\xa9\xc3\xb2\xc3\xa7'
9 - >>> u'èéòç'.encode('utf-8')
10 - '\xc3\xa8\xc3\xa9\xc3\xb2\xc3\xa7'
11 - >>> u'è'
12 - u'\xe8'
13 - >>> u'è'.encode()
14 -'\xc3\xa8'
So, the encoding (explicit encoding) works independently from input encoding (or it seems to me, but I'm stuck on this for days, so maybe I messed up my mind).
WHERE IS THE SOLUTION??
By looking at lines 8 of background
and hint
, you can see that there are some differences of unicode's object that are created.
So, I've started to thought about it.
What have I concluded? Nothing.
Nothing except that, maybe, my encoding problems are into file's encoding once a save my .py (that, contains all utf-8 characters that have to be inserted into html document)
THE "REAL" CODE
The code does nothing special: it opens an html template, place it into a string, replace place holders with unicode (utf-8ed ? wish yes) strings and save it into another file that will be visualizated from the Internet (yes, my "landing" page have into header utf-8's specifications). I don't have code here because it is scattered into several files, but I'm sure of the program's workflow (by tracing it).
FINAL QUESTION
In the light of this, does anybody have any idea for making my code work? Ideas about unix file encoding? Or .py file encoding? How can I change the encoding to make my code work?
LAST HINT
Before substitution of place holders with utf-8 object, if I insert a
utf8Obj.encode('latin-1')
my document is perfectly visible for the internet!
Thanks to those who answer.
EDIT1 - DEVELOPMENT WORKFLOW
Ok, that's my development workflow:
I have a CVS for that project. The project is located onto a centos OS. That server is a 64-bit machine. I develop my code into a Windows 7 (64-bit) with eclipse. Every modification is committed ONLY with CVS commit. The code is exectude onto Centos machine that use that kind of python:
Python 2.4.6 (#1, Jan 27 2012, 15:41:03)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-51)] on linux2
I setted Eclipse for work in that way: PREFERENCES -> GENERAL -> WORKSPACE -> TEXT FILE ENCODING : UTF-8
A Zope/Plone application run onto the same Server: it serves some PHP pages. PHP pages calls some python methods (application logic) by WS that are located onto Zope/Plone "server". That server interface directly to application logic.
That's all
EDIT2
This is the function that does the replace:
def _fillTemplate(self, buf):
"""_fillTemplate(buf)-->str
Ritorna il documento con i campi sostituiti con dict_template.
"""
try:
for k, v in self.dict_template.iteritems():
if not isinstance(v,unicode):
v=str(v)
else:
v=v.encode('latin-1') #In that way it works, but why?
buf = buf.replace(k, v)