26

When setting cookies, PHP url-encodes the cookie value (at least when not using setrawcookie) and it url-decodes the cookie value before making it available to the application in $_COOKIE.

Is this an accepted standard? If I set a raw cookie value of a%3Db, would I get back a=b in most web programming languages (through their respective cookie-reading mechanisms)?

AndreKR
  • 32,613
  • 18
  • 106
  • 168
  • It's perfectly acceptable, even if it's [not _strictly_ mandatory](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#attributes). There are only a handful of values that _must_ be URL encoded, but just blanket URL encoding everything makes it much easier to work with cookies. – Mike 'Pomax' Kamermans Dec 31 '21 at 19:10
  • [What are allowed characters in cookies?](https://stackoverflow.com/questions/1969232/what-are-allowed-characters-in-cookies/1969339) was the question I was really looking for. – Boris Verkhovskiy Mar 25 '22 at 06:07

3 Answers3

16

Yes. While it's not required per the spec, the following is mentioned in RFC6265 (emphasis is in the original document, not added)

To maximize compatibility with user agents, servers that wish to store arbitrary data in a cookie-value SHOULD encode that data, for example, using Base64 [RFC4648].

In my experience, most web frameworks and libraries for cookies have methods for encoding/decoding cookie values. In many cases, esp. in frameworks and high-level languages, this is abstracted away and done automatically.

This answer provides a fairly detailed account of the history behind the values allowed in cookies. Might be of interest to you.

Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153
sytech
  • 29,298
  • 3
  • 45
  • 86
  • 2
    Sadly, as I recently found out through a lot of headache, BASE64 includes the character "+". And there seems to be no way, (not through a POST, or GET request, and definitely not through the $_COOKIE array) to read a cookie in PHP with the "+" character intact. With base64 so prevalent in its use for transmitting binary data in "safe" characters, the inclusion of + (and "=" for that matter) in base 64 seems a major disconnect with browser development. Fortunately there are many other char's to choose from, so its not difficult to substitute others for those few cases. But its a PITA. – Randy Jul 03 '19 at 05:48
  • 1
    @Randy There is an url-safe variant of the base64 encoding, which avoids `+` and `=`. – sstn Oct 23 '19 at 05:58
  • @sstn - Thanks. I up making my own short routine to do it. Besides the '+' and '=' chars giving me trouble, I wanted to be able to directly use the 'processed' name to create a temporary directory on the server. So it turned out the '/' char was also a problem (because it looks like a folder/directory symbol for a file related call). Fortunately there were plenty of usable chars to choose from. – Randy Oct 24 '19 at 13:26
  • 1
    Late to this answer, but the question was whether they should be _URL Encoded_, which is a specific scheme for escaping otherwise-parser-breaking characters, whereas this answer is about whether cookies should be "encoded" in the "transformed to something that cannot be trivially rewritten". The answer to the actual question is the short and sweet "only a handful of characters _must_ be URL encoded, but everyone uses blanket URL encoding because it's makes things much easier than having to escape and then decode only a few specific things". – Mike 'Pomax' Kamermans Dec 31 '21 at 19:08
  • 1
    @Mike'Pomax'Kamermans yeah, good clarification. As I understand it, the question asks if _URL-encoding_ the cookie value is a good/standard practice. Encoding the values is a standard (but not required) practice (to rephrase your statement: as an easy way to have valid values and avoid a custom encoder/decoder) -- The specific scheme used for encoding will vary by framework. For example, PHP uses url encoding, while other frameworks may use base64 (like the Flask framework, for example) or some other scheme. Developers may also choose to format the value themselves (e.g. PHP's `setrawcookie`). – sytech Dec 31 '21 at 19:22
16

sytech's answer (which I have accepted) is certainly correct as it quotes the spec, but since the spec is rather vague, here's an overview how some web frameworks actually handle the matter:

RFC6265:           "for example Base64"
PHP:               URL encode
Go:                raw
Node.js + Express: URL encode
AndreKR
  • 32,613
  • 18
  • 106
  • 168
3

Stolen from NCZOnline:

There is some confusion over encoding of a cookie value. The commonly held belief is that cookie values must be URL-encoded, but this is a fallacy even though it is the de facto implementation. The original specification indicates that only three types of characters must be encoded: semicolon, comma, and white space. The specification indicates that URL encoding may be used but stops short of requiring it. The RFC makes no mention of encoding whatsoever. Still, almost all implementations perform some sort of URL encoding on cookie values. In the case of name=value formats, the name and value are typically encoded separately while the equals sign is left as is.

Paul Sturm
  • 2,118
  • 1
  • 18
  • 23
  • 1
    I would not put any stock in what that article says. It pre-dates RFC 6265 and refers to RFCs that are long since obsolete. – Todd Menier Jul 28 '20 at 18:32
  • Note that as per the text in [RFC 6265, "HTTP State Management"](https://httpwg.org/specs/rfc6265.html#sane-set-cookie), this is factually incorrect. Control codes, all whitespace, and the four characters `"`, `,`, `;`, and `/` are all not permitted and must be escaped. – Mike 'Pomax' Kamermans Dec 31 '21 at 19:12
  • @Mike'Pomax'Kamermans source that these are illegal? They are being used on an external backend we're using... – James D Mar 15 '22 at 12:34
  • You mean the link in the comment you are responding to? – Mike 'Pomax' Kamermans Mar 15 '22 at 15:51