28

Is there a simple method I'm missing in urllib or other library for this task? URL encoding replaces unsafe ASCII characters with a "%" followed by two hexadecimal digits.

Here's an example of an input and my expected output:

Mozilla/5.0 (Linux; U; Android 4.0; xx-xx; Galaxy Nexus Build/IFL10C) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30

Mozilla%2F5.0+%28Linux%3B+U%3B+Android+4.0%3B+xx-xx%3B+Galaxy+Nexus+Build%2FIFL10C%29+AppleWebKit%2F534.30+%28KHTML%2C+like+Gecko%29+Version%2F4.0+Mobile+Safari%2F534.30
wim
  • 338,267
  • 99
  • 616
  • 750

3 Answers3

53

For Python 2.x, use urllib.quote

Replace special characters in string using the %xx escape. Letters, digits, and the characters '_.-' are never quoted. By default, this function is intended for quoting the path section of the URL. The optional safe parameter specifies additional characters that should not be quoted — its default value is '/'.

example:

In [1]: import urllib

In [2]: urllib.quote('%')
Out[2]: '%25'

EDIT:

In your case, in order to replace space by plus signs, you may use urllib.quote_plus

example:

In [4]: urllib.quote_plus('a b')
Out[4]: 'a+b'

For Python 3.x, use quote

>>> import urllib
>>> a = "asdas#@das"
>>> urllib.parse.quote(a)
'asdas%23%40das'

and for string with space use quote_plus

>>> import urllib
>>> a = "as da& s#@das"
>>> urllib.parse.quote_plus(a)
'as+da%26+s%23%40das'
qiao
  • 17,941
  • 6
  • 57
  • 46
  • or [urllib.quote_plus](http://docs.python.org/library/urllib.html#urllib.quote_plus), since OP wants `+` instead of `%20`. – Avaris Jan 18 '12 at 06:09
  • 2
    but to get what the OP asks for, use `urllib.quote_plus`. – Dan D. Jan 18 '12 at 06:10
  • 2
    I believe, for Python 3.*, you should do `import urllib.parse ... urllib.parse.quote ...` or `from urllib import parse ... parse.quote ...` rather than `import urllib ... urllib.parse.quote ...`, which will result in `AttributeError: module 'urllib' has no attribute 'parse'`, kind of similar to [imports in `werkzeug`](https://stackoverflow.com/questions/47688957/import-werkzeug-vs-from-werkzeug-import-security). Tested on Python 3.6.1. –  Jul 25 '18 at 09:19
3

Keep in mind that both urllib.quote and urllib.quote_plus throw an error if an input is a unicode string:

s = u'\u2013'
urllib.quote(s)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\urllib.py", line 1303, in quote
    return ''.join(map(quoter, s))
KeyError: u'\u2013'

As answered here on SO, one has to use 'UTF-8' explicitly:

urllib.quote(s.encode('utf-8'))
Community
  • 1
  • 1
oldbam
  • 2,397
  • 1
  • 16
  • 24
1

Also, if you have a dict of several values, the best way to do it will be urllib.urlencode.

Y2H
  • 2,419
  • 1
  • 19
  • 37