I am currently working with a python script (appengine) that takes an input from the user (text) and stores it in the database for re-distribution later.
The text that comes in is unknown, in terms of encoding and I need to have it encoded only once.
Example Texts from clients:
- This%20is%20a%20test
- This is a test
Now in python what I thought I could do is decode it then encode it so both samples become:
- This%20is%20a%20test
- This%20is%20a%20test
The code that I am using is as follows:
#
# Dencode as UTF-8
#
pl = pl.encode('UTF-8')
#
#Unquote the string, then requote to assure encoding
#
pl = urllib.quote(urllib.unquote(pl))
Where pl
is from the POST parameter for payload.
The Issue
The issue is that sometimes I get special (Chinese, Arabic) type chars and I get the following error.
'ascii' codec can't encode character u'\xc3' in position 0: ordinal not in range(128)
..snip..
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc3' in position 0: ordinal not in range(128)
does anyone know the best solution to process the string given the above issue?
Thanks.