0

I'm using selenium to insert text input with german umlauts in a web formular. The declared coding for the python script is utf-8. The page uses utf-8 encoding. When i definine a string like that everything works fine:

q = u"Hällö" #type(q) returns unicode
...
textbox.send_keys(q)

But when i try to read from a config file using ConfigParser (or another kind of file) i get malformed output in the webformular (Hällö). This is the code i use for that:

the_encoding = chardet.detect(q)['encoding'] #prints utf-8
q = parser.get('info', 'query') # type(q) returns str
q = q.decode('unicode-escape') # type(q) returns unicode
textbox.send_keys(q)

Whats the difference between the both q's given to the send_keys function?

Robin
  • 303
  • 1
  • 4
  • 16
  • Try `q.decode('latin-1')` instead. – cs95 Aug 06 '17 at 18:43
  • Getting the same malformed output – Robin Aug 06 '17 at 18:53
  • This is a classic example of mojibake. If you do this in a UTF-8 terminal (in Python 2 or 3): `print(u"Hällö".encode('utf8').decode('latin1'))`, you'll get `Hällö`. Conversely, `print(u'Hällö'.encode('latin1').decode('utf8'))` prints `Hällö`. – PM 2Ring Aug 06 '17 at 19:25

2 Answers2

0

This is probably bad encoding. Try printing q before the last statement, and see if it's equal. This line q = parser.get('info', 'query') # type(q) returns str should return the string 'H\xc3\xa4ll\xc3\xb6'. If it's different, then you are using the wrong coding.

>>> q = u"Hällö"  # unicode obj
>>> q
u'H\xe4ll\xf6'
>>> print q
Hällö
>>> q.encode('utf-8')
'H\xc3\xa4ll\xc3\xb6'
>>> a = q.encode('utf-8')  # str obj
>>> a
'H\xc3\xa4ll\xc3\xb6'  # <-- this should be the value of the str
>>> a.decode('utf-8')  # <-- unicode obj
u'H\xe4ll\xf6'
>>> print a.decode('utf-8')
Hällö
>>> 
Chen A.
  • 10,140
  • 3
  • 42
  • 61
0
from ConfigParser import SafeConfigParser
import codecs

parser = SafeConfigParser()

with codecs.open('cfg.ini', 'r', encoding='utf-8-sig') as f:
    parser.readfp(f)
greet = parser.get('main', 'greet')

print 'greet:', greet.encode('utf-8-sig')

greet: Hällö

cfg.ini file

[main]
greet=Hällö
Vadim
  • 1
  • 1