11

I want to encode URL with special characters. In my case it is: š, ä, õ, æ, ø (it is not a finite list).

urllib2.quote(symbol) gives very strange result, which is not correct. How else these symbols can be encoded?

AlG
  • 14,697
  • 4
  • 41
  • 54
Bob
  • 10,427
  • 24
  • 63
  • 71
  • What is the expected encode result? – falsetru Jul 25 '14 at 11:33
  • "it is not a finite list" I might want to challenge this statement, seeing that the list of possible unicode characters is finite. ;-) Also, what @falsetru said. I get `urllib2.quote('ä') == '%C3%A4'`, which is correct. – Kijewski Jul 25 '14 at 11:34
  • urllib2.quote("Grønlandsleiret, Oslo, Norway") gives a %27Gr%B8nlandsleiret%2C%20Oslo%2C%20Norway%27 and I do a request to Google Maps (https://maps.googleapis.com/maps/api/geocode/json?address=%27Gr%B8nlandsleiret%2C%20Oslo%2C%20Norway%27) i get invalid request as a response. – Bob Jul 25 '14 at 11:36
  • @Bob, `urllib2.quote("Grønlandsleiret, Oslo, Norway")` returns `'Gr%C3%B8nlandsleiret%2C%20Oslo%2C%20Norway'` for me. And accessing corresponding url show me valid response: https://maps.googleapis.com/maps/api/geocode/json?address=Gr%C3%B8nlandsleiret%2C%20Oslo%2C%20Norway – falsetru Jul 25 '14 at 11:38
  • I am using Python 2.7.8. May this be a problem? – Bob Jul 25 '14 at 11:39

1 Answers1

14

urllib2.quote("Grønlandsleiret, Oslo, Norway") gives a %27Gr%B8nlandsleiret%2C%20Oslo%2C%20Norway%27

Use UTF-8 explicitly then:

urllib2.quote(u"Grønlandsleiret, Oslo, Norway".encode('UTF-8'))

And always state the encoding in your file. See PEP 0263.


A non-UTF-8 string needs to be decode first, then encoded:

                           # You've got a str "s".
s = s.decode('latin-1')    # (or what the encoding might be …)
                           # Now "s" is a unicode object.
s = s.encode('utf-8')      # Encode as UTF-8 string.
                           # Now "s" is a str again.
s = urllib2.quote(s)       # URL encode.
                           # Now "s" is encoded the way you need it.
Kijewski
  • 25,517
  • 12
  • 101
  • 143
  • 1
    It works! encode('UTF-8') was that I was looking for. – Bob Jul 25 '14 at 11:41
  • Question: if i have an address as a variable, then urllib2.quote('u' + address.encode('UTF-8')) gives an error -> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 6: ordinal not in range(128). What should I do? – Bob Jul 25 '14 at 11:55