7

When I enter a URL into maps.google.com such as https://dl.dropbox.com/u/94943007/file.kml , it will encode this URL into:

https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml

I am wondering what is this encoding called and is there a way to encode a URL like this using python?

I tried this:

The process is called URL encoding:

>>> urllib.quote('https://dl.dropbox.com/u/94943007/file.kml', '')
'https%3A%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml'

but did not get the expected results:

'https%3A//dl.dropbox.com/u/94943007/file.kml'

what i need is this:

https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml

how do i encode this URL properly?

the documentation here:

https://developers.google.com/maps/documentation/webservices/

states:

All characters to be URL-encoded are encoded using a '%' character and a two-character hex value corresponding to their UTF-8 character. For example, 上海+中國 in UTF-8 would be URL-encoded as %E4%B8%8A%E6%B5%B7%2B%E4%B8%AD%E5%9C%8B. The string ? and the Mysterians would be URL-encoded as %3F+and+the+Mysterians.

Marcelo
  • 9,387
  • 3
  • 35
  • 40
Alex Gordon
  • 57,446
  • 287
  • 670
  • 1,062

1 Answers1

7

Use

urllib.quote_plus(url, safe=':')

Since you don't want the colon encoded you need to specify that when calling urllib.quote():

>>> expected = 'https:%2F%2Fdl.dropbox.com%2Fu%2F94943007%2Ffile.kml'
>>> url = 'https://dl.dropbox.com/u/94943007/file.kml'
>>> urllib.quote(url, safe=':') == expected
True

urllib.quote() takes a keyword argument safe that defaults to / and indicates which characters are considered safe and therefore don't need to be encoded. In your first example you used '' which resulted in the slashes being encoded. The unexpected output you pasted below where the slashes weren't encoded probably was from a previous attempt where you didn't set the keyword argument safe at all.

Overriding the default of '/' and instead excluding the colon with ':' is what finally yields the desired result.

Edit: Additionally, the API calls for spaces to be encoded as plus signs. Therefore urllib.quote_plus() should be used (whose keyword argument safe doesn't default to '/').

Lukas Graf
  • 30,317
  • 8
  • 77
  • 92
  • Note that this does not deal with the lack of escaping that OP was expecting. – Marcin Aug 24 '12 at 18:43
  • @Marcin, I'm not exactly sure what you mean. Are you referring to `%xx` escapes or something else? – Lukas Graf Aug 24 '12 at 18:47
  • I downvoted this because it did not at the time make explicit its results or how it would resolve *all* of the issues identified by OP. It could still use an explanation of how it achieves the desired outcome. – Marcin Aug 24 '12 at 18:57
  • Oh, I see. I edited my answer to explain a little bit better what's going on. – Lukas Graf Aug 24 '12 at 19:07