3

RFC1738 SEC 2.2 says that:

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

After some searching, I summarized that there are three types of characters that should be encoded:

  • Unsafe characters: '#', ' ', '"', '%', '<', '>', '{', '}', "|", '', '^', '~', '[', ']', '`'.
  • Reserved characters: ';', '/', '?', ':', '@', '=', '&'.
  • Special characters: "$-_.+!*'(),"

I know why and when unsafe characters and reserved characters should be encoded. RFC1738 states that special characters can be used uncoded, but I found that urllib2.quote also encode these special characters to "%24-_.%2B%21%2A%27%28%29%2C%7E". So, I am a little confused about why special characters are encoded if they can be used unencoded within a URL and why they are special.

Community
  • 1
  • 1
expoter
  • 1,622
  • 17
  • 34
  • 1
    Side note: You probably want to use [RFC 3986](https://tools.ietf.org/html/rfc3986) (which is [the URI standard](https://tools.ietf.org/html/std66)) instead of RFC 1738. – unor Nov 26 '16 at 00:52

0 Answers0