19

I've got a string from an HTTP header, but it's been escaped.. what function can I use to unescape it?

myemail%40gmail.com -> myemail@gmail.com

Would urllib.unquote() be the way to go?

Ian
  • 24,116
  • 22
  • 58
  • 96

3 Answers3

38

I am pretty sure that urllib's unquote is the common way of doing this.

>>> import urllib
>>> urllib.unquote("myemail%40gmail.com")
'myemail@gmail.com'

There's also unquote_plus:

Like unquote(), but also replaces plus signs by spaces, as required for unquoting HTML form values.

Paolo Bergantino
  • 480,997
  • 81
  • 517
  • 436
  • 1
    K, just wanted to make sure.. I hate using a function that appears to do the job, but ends up only working with a few examples that I did and breaking with real world vars. heh. Then it becomes impossible to track down the problem.. :P – Ian Apr 23 '09 at 04:59
3

In Python 3, these functions are urllib.parse.unquote and urllib.parse.unquote_plus.

The latter is used for example for query strings in the HTTP URLs, where the space characters () are traditionally encoded as plus character (+), and the + is percent-encoded to %2B.

In addition to these there is the unquote_to_bytes that converts the given encoded string to bytes, which can be used when the encoding is not known or the encoded data is binary data. However there is no unquote_plus_to_bytes, if you need it, you can do:

def unquote_plus_to_bytes(s):
    if isinstance(s, bytes):
        s = s.replace(b'+', b' ')
    else:
        s = s.replace('+', ' ')
    return unquote_to_bytes(s)

More information on whether to use unquote or unquote_plus is available at URL encoding the space character: + or %20.

Community
  • 1
  • 1
2

Yes, it appears that urllib.unquote() accomplishes that task. (I tested it against your example on codepad.)

las3rjock
  • 8,612
  • 1
  • 31
  • 33