urllib.unquote_plus(s) does not convert plus symbol to space

Question

from the documents, the urllib.unquote_plus should replce plus signs by spaces. but when I tried the below code in IDLE for python 2.7, it did not.

>>s = 'http://stackoverflow.com/questions/?q1=xx%2Bxx%2Bxx'
>>urllib.unquote_plus(s)
>>'http://stackoverflow.com/questions/?q1=xx+xx+xx'

I also tried doing something like urllib.unquote_plus(s).decode('utf-8'). is there a proper to decode the url component?

score 20 · Accepted Answer · edited Jul 19 '22 at 11:51

20

%2B is the escape code for a literal +; it is being unescaped entirely correctly.

Don't confuse this with the URL escaped +, which is the escape character for spaces:

>>> s = 'http://stackoverflow.com/questions/?q1=xx+xx+xx'
>>> urllib.parse.unquote_plus(s)
'http://stackoverflow.com/questions/?q1=xx xx xx'

unquote_plus() only decodes encoded spaces to literal spaces ('+' -> ' '), not encoded + symbols ('%2B' -> '+').

If you have input to decode that uses %2B instead of + where you expected spaces, then those input values were perhaps doubly quoted, you'd need to unquote them twice. You'd see % escapes encoded too:

>>> urllib.parse.quote_plus('Hello world!')
'Hello+world%21'
>>> urllib.parse.quote_plus(urllib.quote_plus('Hello world!'))
'Hello%2Bworld%2521'

where %25 is the quoted % character.

edited Jul 19 '22 at 11:51

Nallath

2,100
20
37

answered Sep 06 '13 at 16:21

Martijn Pieters

1,048,767
296
4,058
3,343

Your comment contradicts http://stackoverflow.com/questions/4737841/urlencoder-not-able-to-translate-space-character. Not sure who's right or wrong, but seems like this is something that would easily confuse somebody building an API using Python and using Android to develop for it – KVISH Mar 04 '14 at 06:46
How do that question and this answer contradict at all? – Martijn Pieters Mar 04 '14 at 08:29
1

Spaces are encoded to `+`, but a `+` in the original data is encoded to `%2B`. There is no contradiction here; you are confusing original unencoded data with the encoded result. – Martijn Pieters Mar 04 '14 at 08:35
@Nallath: thanks for the edit, but then also update the *question*. This answer was written to cover Python 2, not Python 3, because that's what the question asker used at the time. – Martijn Pieters Aug 12 '22 at 17:43

score 4 · Answer 2 · answered Sep 06 '13 at 16:22

4

Those aren't spaces, those are actual pluses. A space is %20, which in that part of the URL is indeed equivalent to +, but %2B means a literal plus.

answered Sep 06 '13 at 16:22

gcbirzan

1,494
11
17

urllib.unquote_plus(s) does not convert plus symbol to space

2 Answers2