0

I have a website looks like this:

http://abc.com/hsdl-3201%23008-lite-on-12275800/hsdl-3201%23008-lite-on-12275800

Clearly, because there are some weird characters that don't fit into the URL format and they have been encoded into %20 like characters.. I am wondering how could I easily decode that back to python string that contains the original character.

Thanks.

B.Mr.W.
  • 18,910
  • 35
  • 114
  • 178

2 Answers2

3

Python 3

from urllib.parse import unquote

Python 2

from urllib import unquote

Then

unquote('http://abc.com/hsdl-3201%23008-lite-on-12275800/hsdl-3201%23008-lite-on-12275800')
#>>> 'http://abc.com/hsdl-3201#008-lite-on-12275800/hsdl-3201#008-lite-on-12275800'

Also check unquote_plus if you're doing this for parsing forms, where spaces are encoded to "+" and thus need to be decoded.

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
Veedrac
  • 58,273
  • 15
  • 112
  • 169
2

Using urllib.unquote:

From the docs:

urllib.unquote(string) Replace %xx escapes by their single-character equivalent.

Example: unquote('/%7Econnolly/') yields '/~connolly/'.

Thomas Orozco
  • 53,284
  • 11
  • 113
  • 116