RFC 6265 Sec 6.1 specifies allowing atleast 4096 bytes per cookie.
Now in order to know the number of characters allowed per cookie, I need to know the character encoding being used for cookies, as the RFC specifies the maximum size per cookie in terms of bytes and not characters.
How do I know the encoding being used to store cookies?
Is it determined by the character encoding used by the programming language used to create cookies (e.g PHP, JavaScript) or the character encoding being used by the browser storing cookies?
Update:
I conducted a few tests, and it appears that FF, Chrome and Opera seem to be using UTF-8 for cookie storage, and the encoding obviously affects the number of characters you could store in a cookie. The maximum number of characters allowed in a cookie would be affected by the character encoding being used to store cookies on a client.
Suspecting the browsers are using UTF-8 as the character encoding for cookies, I used the tests here with a single-byte UTF-8 character (1
), two-byte UTF-8 character (£
), a 3-byte UTF-8 character (畀
), and a 4-byte UTF-8 character (). I've pasted the results obtained below.
Every cookie set used a single-byte cookie name, and the number of characters mentioned does not include the single-byte character for the cookie name and the character =
used to separate cookie name and coookie value. The value in []
beside each Unicode character denotes its hex representation in UTF-8.
FF 31.0
Firefox relaxes the RFC limit by a byte and puts a limit of 4097 bytes per cookie.
- 1-byte character (
1
, [0x31]) -- 4095 characters - 2-byte character (
£
, [0xC2, 0xA3]) -- 2047 characters - 3-byte character (
畀
, [0xE7, 0x95, 0x80]) -- 1365 characters - 4-byte character (
, [0xF0, 0x9D, 0x86, 0x8F]) -- 1023 characters
Chrome 36.0.1985.143
- 1-byte character (
1
, [0x31]) -- 4094 characters - 2-byte character (
£
, [0xC2, 0xA3]) -- 2047 characters - 3-byte character (
畀
, [0xE7, 0x95, 0x80]) -- 1364 characters - 4-byte character (
, [0xF0, 0x9D, 0x86, 0x8F]) -- 1023 characters
Opera 24.0.1558.17
- 1-byte character (
1
, [0x31]) -- 4094 characters - 2-byte character (
£
, [0xC2, 0xA3]) -- 2047 characters - 3-byte character (
畀
, [0xE7, 0x95, 0x80]) -- 1364 characters - 4-byte character (
, [0xF0, 0x9D, 0x86, 0x8F]) -- 1023 characters
IE 8.0.6001.19518
IE too relaxes the RFC limit to 5117 bytes per cookie, but also enforces a maximum cookies' size per domain limit (in this case, the limit found was 10234 characters)
- 1-byte character (
1
, [0x31]) -- 5115 characters - 2-byte character (
£
, [0xC2, 0xA3]) -- 5115 characters - 3-byte character (
畀
, [0xE7, 0x95, 0x80]) -- 5115 characters - 4-byte character (
, [0xF0, 0x9D, 0x86, 0x8F]) -- 2557 characters
Note on IE:
IE seems to be using the ECMAScript's notion of characters. ECMAScript exposes characters as 16-bit unsigned integers (character encoding could be either UTF-16 or UCS-2 and is left as an implementation choice). The 4-byte character chosen for the tests uses two 16-bit code units in UTF-16. And since ECMAScript counts a 16-bit integer as a characer, "".length === 2
returns true
.
This leads to be counted as two characters.