-2

How to decode unicode string like this:

what%2527s%2bthe%2btime%252c%2bnow%253f

into ascii like this:

what's+the+time+now

Thomas Clayson
  • 29,657
  • 26
  • 147
  • 224
tim
  • 917
  • 3
  • 14
  • 24

3 Answers3

6

in your case, the string was decoded twice, so we need unquote twice to get it back

In [1]: import urllib
In [2]: urllib.unquote(urllib.unquote("what%2527s%2bthe%2btime%252c%2bnow%253f") )
Out[3]: "what's+the+time,+now?"
Kent
  • 189,393
  • 32
  • 233
  • 301
  • At least the outer `unquote` probably wants to be `unquote_plus` instead; I'm guessing those `+`s were originally spaces, submitted as an HTML form (which has a slightly different handling of `+` than regular URL-encoding). But, yeah, the double-encoded string is a red flag for “someone's done something wrong here...” – bobince Sep 24 '11 at 08:08
0

Something like this?

title = u"what%2527s%2bthe%2btime%252c%2bnow%253f"
print title.encode('ascii','ignore')

Also, take a look at this

aus
  • 1,394
  • 1
  • 14
  • 19
0

You could convert the %(hex) escaped chars with something like this:

import re

def my_decode(s):
    re.sub('%([0-9a-fA-F]{2,4})', lambda x: unichr(int(x.group(1), 16)), s)

s = u'what%2527s%2bthe%2btime%252c%2bnow%253f'
print my_decode(s)

results in the unicode string

u'what\u2527s+the+time\u252c+now\u253f'

Not sure how you'd know to convert \u2527 to a single quote, or drop the \u253f and \u252c chars when converting to ascii

barryp
  • 126
  • 3