Python decode text to ascii

Question

How to decode unicode string like this:

what%2527s%2bthe%2btime%252c%2bnow%253f

into ascii like this:

what's+the+time+now

http://stackoverflow.com/questions/275174/how-do-i-perform-html-decoding-encoding-using-python-django — dm03514, Sep 23 '11 at 14:35
"ascii" vs "unicode" is a completely different issue from the one you're having. It could hardly be more different, really. — Karl Knechtel, Sep 23 '11 at 16:09

score 6 · Accepted Answer · answered Sep 23 '11 at 14:36

6

in your case, the string was decoded twice, so we need unquote twice to get it back

In [1]: import urllib
In [2]: urllib.unquote(urllib.unquote("what%2527s%2bthe%2btime%252c%2bnow%253f") )
Out[3]: "what's+the+time,+now?"

answered Sep 23 '11 at 14:36

Kent

189,393
32
233
301

At least the outer `unquote` probably wants to be `unquote_plus` instead; I'm guessing those `+`s were originally spaces, submitted as an HTML form (which has a slightly different handling of `+` than regular URL-encoding). But, yeah, the double-encoded string is a red flag for “someone's done something wrong here...” – bobince Sep 24 '11 at 08:08

score 0 · Answer 2 · answered Sep 23 '11 at 14:30

0

Something like this?

title = u"what%2527s%2bthe%2btime%252c%2bnow%253f"
print title.encode('ascii','ignore')

Also, take a look at this

answered Sep 23 '11 at 14:30

aus

1,394
1
14
19

score 0 · Answer 3 · answered Sep 23 '11 at 14:45

You could convert the %(hex) escaped chars with something like this:

import re

def my_decode(s):
    re.sub('%([0-9a-fA-F]{2,4})', lambda x: unichr(int(x.group(1), 16)), s)

s = u'what%2527s%2bthe%2btime%252c%2bnow%253f'
print my_decode(s)

results in the unicode string

u'what\u2527s+the+time\u252c+now\u253f'

Not sure how you'd know to convert \u2527 to a single quote, or drop the \u253f and \u252c chars when converting to ascii

Python decode text to ascii

3 Answers3