5

I send cyrillic letters from postman to django as a parameter in url and got something like %D0%B7%D0%B2 in variable search_text

actually if to print search_text I got something like текст printed

I've tried in console to make the following and didn't get an error

>>> a = "текст"
>>> a
'\xd1\x82\xd0\xb5\xd0\xba\xd1\x81\xd1\x82'
>>> print a
текст
>>> b = a.decode("utf-8")
>>> b
u'\u0442\u0435\u043a\u0441\u0442'
>>> print b
текст
>>>

by without console I do have an error:

"""WHERE title LIKE '%%{}%%' limit '{}';""".format(search_text, limit))
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)

How to prevent it?

Roberto
  • 1,472
  • 2
  • 16
  • 18

3 Answers3

3

To decode urlencoded string (with '%' signs) use the urllib:

import urllib
byte_string=urllib.unquote('%D0%B7%D0%B2')

and then you'll need to decode the byte_string from it's original encoding, i.e.:

import urllib
import codecs
byte_string=urllib.unquote('%D0%B7%D0%B2')
unicode_string=codecs.decode(byte_string, 'utf-8')

and print(unicode_string) will print зв.

The problem is with the unknown encoding. You have to know what encoding is used for the data you get. To specify the default encoding used in your script .py file, place the following line at the top:

# -*- coding: utf-8 -*-

Cyrillic might be 'cp866', 'cp1251', 'koi8_r' and 'utf-8', this are the most common. So when using decode try those.

Python 2 doesn't use unicode by default, so it's best to enable it or swich to Python 3. To enable unicode in .py file put the following line on top of all imports:

from __future__ import unicode_literals

So i.e. in Python 2.7.9, the following works fine:

# -*- coding: utf-8 -*-
from __future__ import unicode_literals

a="текст"
c="""WHERE title LIKE '%%{}%%' limit '{}';""".format(a, '10')
print(c)

Also see:

https://docs.python.org/2/library/codecs.html

https://docs.python.org/2/howto/unicode.html.

Nikita
  • 6,101
  • 2
  • 26
  • 44
  • This helped thanks, but I still have one problem, when I use `unicode_literals` and trying to make query with `like %search_text%`. This query somehow is case sensitive and there is difference between `Зв` and `зв`. I've tried sql `LOWER(title)` or `UPPER(title)` with `LOWER(search_text)` but it didn't helped. May be you have any ideas on this point? How to make cyrillic case insensitive select. – Roberto Feb 13 '16 at 23:45
  • @Roberto: that depends on the database. Probably better in a separate question (that may already exist) – RemcoGerlich Feb 14 '16 at 07:57
  • @Roberto, that has nothing to do with `unicode_literals`. `unicode_literals` just make strings in Python 2 behave like they are Python 3, precisely making all strings unicode by default. Case sensitivity is DB specific and you just might want to use `ILIKE` instead of `LIKE`, which is case insesitive. – Nikita Feb 14 '16 at 09:13
2

it depends on what encoding the django program is expecting and the strings search_text, limit are. usually its sufficient to do this:

"""WHERE title LIKE '%%{}%%' limit '{}';""".decode("utf-8").format(search_text.decode("utf-8"), limit)

EDIT** after reading your edits, it seems you are having problems changing back your urlparsed texts into strings. heres an example of how to do this:

import urlparse
print urlparse.urlunparse(urlparse.urlparse("ресторан"))
bmbigbang
  • 1,318
  • 1
  • 10
  • 15
  • This also returns an error: `UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)` – Roberto Feb 13 '16 at 18:10
  • you really need to be more clear on the encoding used for search_text, limit and the one required by django. try this again, i have edited the code – bmbigbang Feb 13 '16 at 18:13
  • same error: `return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)` – Roberto Feb 13 '16 at 18:19
  • well thats because you have to also decode the string that you are formattin. i am changing the code again, make sure "limit" doesnt have any non ascii chars in it – bmbigbang Feb 13 '16 at 18:27
  • still same error, I also tried the same in console and didn't got error. I also updated description. – Roberto Feb 13 '16 at 18:35
  • to convert unicode into correct urls, you can try this: http://stackoverflow.com/questions/804336/best-way-to-convert-a-unicode-url-to-ascii-utf-8-percent-escaped-in-python – bmbigbang Feb 13 '16 at 19:39
1

You can use '{}'.format(search_text.encode('utf-8'))) to interpret the string as utf-8, but it probably will show your cyrillic letters as \xd0.

And read The Absolute Minimum Every Software Developer Must Know About Unicode and Character Sets.

bastelflp
  • 9,362
  • 7
  • 32
  • 67
  • Your suggestion doesn't work for me and returns the following error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 23: ordinal not in range(128) – Roberto Feb 13 '16 at 18:06