11

from the documents, the urllib.unquote_plus should replce plus signs by spaces. but when I tried the below code in IDLE for python 2.7, it did not.

>>s = 'http://stackoverflow.com/questions/?q1=xx%2Bxx%2Bxx'
>>urllib.unquote_plus(s)
>>'http://stackoverflow.com/questions/?q1=xx+xx+xx'

I also tried doing something like urllib.unquote_plus(s).decode('utf-8'). is there a proper to decode the url component?

jjennifer
  • 1,285
  • 4
  • 12
  • 22

2 Answers2

20

%2B is the escape code for a literal +; it is being unescaped entirely correctly.

Don't confuse this with the URL escaped +, which is the escape character for spaces:

>>> s = 'http://stackoverflow.com/questions/?q1=xx+xx+xx'
>>> urllib.parse.unquote_plus(s)
'http://stackoverflow.com/questions/?q1=xx xx xx'

unquote_plus() only decodes encoded spaces to literal spaces ('+' -> ' '), not encoded + symbols ('%2B' -> '+').

If you have input to decode that uses %2B instead of + where you expected spaces, then those input values were perhaps doubly quoted, you'd need to unquote them twice. You'd see % escapes encoded too:

>>> urllib.parse.quote_plus('Hello world!')
'Hello+world%21'
>>> urllib.parse.quote_plus(urllib.quote_plus('Hello world!'))
'Hello%2Bworld%2521'

where %25 is the quoted % character.

Nallath
  • 2,100
  • 20
  • 37
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Your comment contradicts http://stackoverflow.com/questions/4737841/urlencoder-not-able-to-translate-space-character. Not sure who's right or wrong, but seems like this is something that would easily confuse somebody building an API using Python and using Android to develop for it – KVISH Mar 04 '14 at 06:46
  • How do that question and this answer contradict at all? – Martijn Pieters Mar 04 '14 at 08:29
  • 1
    Spaces are encoded to `+`, but a `+` in the original data is encoded to `%2B`. There is no contradiction here; you are confusing original unencoded data with the encoded result. – Martijn Pieters Mar 04 '14 at 08:35
  • @Nallath: thanks for the edit, but then also update the *question*. This answer was written to cover Python 2, not Python 3, because that's what the question asker used at the time. – Martijn Pieters Aug 12 '22 at 17:43
4

Those aren't spaces, those are actual pluses. A space is %20, which in that part of the URL is indeed equivalent to +, but %2B means a literal plus.

gcbirzan
  • 1,494
  • 11
  • 17