20

How do I decode percent-encoded characters to ordinary unicode characters?

"Lech_Kaczy%C5%84ski"    ⟶    "Lech_Kaczyński"
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
yak
  • 3,770
  • 19
  • 60
  • 111
  • 1
    Possible duplicate of [How to unquote a urlencoded unicode string in python?](http://stackoverflow.com/questions/300445/how-to-unquote-a-urlencoded-unicode-string-in-python) – Peter Wood Oct 15 '15 at 08:36

3 Answers3

27

For Python 3, using urllib.parse.unquote:

from urllib.parse import unquote

print(unquote("Lech_Kaczy%C5%84ski"))

Output:

Lech_Kaczyński
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
14

For Python 2, using urllib.unquote:

import urllib
urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8')

This will return a unicode string:

u'Lech_Kaczy\u0144ski'

which you can then print and process as usual. For example:

print(urllib.unquote("Lech_Kaczy%C5%84ski").decode('utf8'))

will result in

Lech_Kaczyński
Mateen Ulhaq
  • 24,552
  • 19
  • 101
  • 135
  • It gives me `Lech_Kaczy\xc5\x84ski`, instead of `Lech_Kaczyński` – yak Oct 15 '15 at 08:36
  • That doesn't look like a unicode string, are you sure you tried correctly? Here's my session: ... (I'll edit it in the post) – Matthias C. M. Troffaes Oct 15 '15 at 08:38
  • I'm not sure you even need the `decode` call (based only on it working when I try without). – Holloway Oct 15 '15 at 08:41
  • Make sure you put the decode('utf8') at the very end. I can only reproduce what you get if I do the decoding in the wrong place. – Matthias C. M. Troffaes Oct 15 '15 at 08:43
  • Trengot: technically it is not necessary. However, in python is is generally recommended to convert all your text in unicode as soon as possible, so you don't need to worry about encodings when you pass this to other functions. – Matthias C. M. Troffaes Oct 15 '15 at 08:45
  • @yak, you mut use a display method that's compatible with utf-8 if your python is expecting an ascii display it will not attempt to display non-ascii symbols. – Jasen Mar 08 '21 at 23:21
1

This worked for me:

import urllib

print urllib.unquote('Lech_Kaczy%C5%84ski')

Prints out

Lech_Kaczyński
answerzilla
  • 181
  • 2
  • 4