Encoding utf-8 from html form

Question

I've a Python/Django program which is currently running with some Greek characters. I'm getting a problem with the following error: 'ascii' codec can't encode characters in position 0-5: ordinal not in range(128)

On this line of code

if l.answer == str(request.POST.get('resp_162')).encode('utf-8'):

The input is Μεξικό. It clearly doesn't like the accented o.

I've read the documentation but I really don't get it

Possible duplicate of [Encoding gives "'ascii' codec can't encode character … ordinal not in range(128)"](http://stackoverflow.com/questions/2513027/encoding-gives-ascii-codec-cant-encode-character-ordinal-not-in-range128) — Sayse, May 12 '16 at 10:48
Why are you trying to convert it to a bytestring instead of keeping it as text? — Ignacio Vazquez-Abrams, May 12 '16 at 10:48
I've tried with and without the '.encode('utf-8') and get the same error — HenryM, May 12 '16 at 10:52
But *why* are you converting it to str at all? Why not just `if l.answer == request.POST.get('resp_162'))`? — Daniel Roseman, May 12 '16 at 10:55

score 0 · Accepted Answer · answered May 12 '16 at 11:19

request.POST.get('resp_162') will return a unicode object (unicode string) - or None but well, that's another problem. There are two ways you can convert it to a str object (byte string): by passing it to str - ie str(request.POST.get('resp_162')) - or by encoding it to a byte string codec using unicode.encode(...), ie request.POST.get('resp_162').encode("utf-8"). The first solution will use the 'ascii' codec, the second will use the codec you ask for.

Since you're first passing your unicode string to str and it contains non-ascii characters you get a UnicodeEncodeError at this point. IOW : just use the second solution and you won't have an error.

This being said : internally, Django only uses unicode strings (for what you get from your models, forms, request etc), and the only sane approach is to stick to unicode strings everywhere (decode the byte strings at system input and encode them to the desired encoding at system output). I don't know what l.answer is in your snippet but if 'l' (very bad naming choice FWIW) is a model instance and .answer a text field, it already is a unicode string so you really shouldn't try to make it a byte string.

Encoding utf-8 from html form

1 Answers1