-2
str1="khloé kardashian"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 4: ordinal not in range(128)

how to encode it in perfect way. I am trying to replace this in URL in flask app: It works well on command line but return above error in the app:

>>> url ="google.com/q=apple"
>>> url.replace("q=apple", "q={}".format(str1))
'google.com/q=khlo\xc3\xa9 kardashian'
Raj
  • 368
  • 1
  • 5
  • 17
  • 4
    ASCII does not have the character "é". What exactly are you trying to accomplish? – DYZ May 20 '20 at 20:24
  • Also, is this Python 2.7 or 3.x? Please specify, as they handle Unicode characters differently. – DYZ May 20 '20 at 20:29
  • https://stackoverflow.com/questions/51710082/what-does-unicodedata-normalize-do-in-python –  May 20 '20 at 20:30
  • @Anwarvic that is because the browser handles the encoding details for you. See for example section 2 of https://tools.ietf.org/html/rfc3986 – Karl Knechtel May 20 '20 at 20:45

3 Answers3

2

You should use urllib to construct the URL correctly. You have other issues in your URL, e.g., a white space. urllib takes care of them.

params = {'q': str1}    
"google.com/" + urllib.urlencode(params)
#'google.com/q=khlo%C3%A9%20kardashian'
DYZ
  • 55,249
  • 10
  • 64
  • 93
  • In Python 3, the old `urllib` module was split up into a package, and `urllib.parse` has the relevant tools for handling URL contents (such as this). I fixed the code example accordingly. – Karl Knechtel May 20 '20 at 20:47
  • @KarlKnechtel The OP uses 2.7. Hence, the solution is for 2.7. – DYZ May 20 '20 at 20:48
  • Oh, jeez. Just noticed that. Disappointed that this is still a thing (especially for a question that explicitly revolves around Unicode), but I guess I can't do anything about it. – Karl Knechtel May 20 '20 at 20:50
0

use utf-8 instead

str1="khloé kardashian"
str1.encode("utf-8")
jv95
  • 681
  • 4
  • 18
  • ```>>> str1.encode("utf-8") Traceback (most recent call last): File "", line 1, in UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)``` – Raj May 20 '20 at 20:25
  • make sure you deleted "str1.encode("ascii")" from your code – jv95 May 20 '20 at 20:27
0

A URL, per the standard, cannot have é in it. You need to use the appropriate URL encoding, which is handled by the built-in urllib package.

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
  • There is a separate scheme called `idna` that is used to handle Unicode characters within the domain name, so that the browser can display Unicode to the user while sending something more old-fashioned to the DNS. This is also provided for in the `urllib` package. See for example https://stackoverflow.com/questions/41067320/string-encodings-idna-utf-8-python – Karl Knechtel May 20 '20 at 20:41