You are mixing encodings with your byte strings. Here's a short working example reproducing the issue. I assume you are running in a Windows console that defaults to an encoding of cp852
:
#!python2
# coding: utf-8
from xml.etree import ElementTree as et
name_element = et.Element('data')
name_element.text = u'Naturalne mydło odświeżające'
name = et.tostring(name_element,encoding='cp852', method='text')
print name
print name.replace('ł', 'l')
Output (no replacement):
Naturalne mydło odświeżające
Naturalne mydło odświeżające
The reason is, the name
string was encoded in cp852
but the byte string constant 'ł'
is encoded in the source code encoding of utf-8
.
print repr(name)
print repr('ł')
Output:
'Naturalne myd\x88o od\x98wie\xbeaj\xa5ce'
'\xc5\x82'
The best solution is to use Unicode strings:
#!python2
# coding: utf-8
from xml.etree import ElementTree as et
name_element = et.Element('data')
name_element.text = u'Naturalne mydło odświeżające'
name = et.tostring(name_element,encoding='cp852', method='text').decode('cp852')
print name
print name.replace(u'ł', u'l')
print repr(name)
print repr(u'ł')
Output (replacement was made):
Naturalne mydło odświeżające
Naturalne mydlo odświeżające
u'Naturalne myd\u0142o od\u015bwie\u017caj\u0105ce'
u'\u0142'
Note that Python 3's et.tostring
has a Unicode option, and string constants are Unicode by default. The repr()
version of the string is more readable as well, but ascii()
implements the old behavior. You'll also find that Python 3.6 will print Polish even to consoles not using a Polish code page, so maybe you wouldn't need to replace the characters at all.
#!python3
# coding: utf-8
from xml.etree import ElementTree as et
name_element = et.Element('data')
name_element.text = 'Naturalne mydło odświeżające'
name = et.tostring(name_element,encoding='unicode', method='text')
print(name)
print(name.replace('ł','l'))
print(repr(name),repr('ł'))
print(ascii(name),ascii('ł'))
Output:
Naturalne mydło odświeżające
Naturalne mydlo odświeżające
'Naturalne mydło odświeżające' 'ł'
'Naturalne myd\u0142o od\u015bwie\u017caj\u0105ce' '\u0142'