2

Having read Joel on Encoding like a good boy, I find myself perplexed by the workings of Foundation's JSONDecoder, neither of whose init or decode methods take an encoding value. Looking through the docs, I see the instance variable dataDecodingStrategy, which perhaps this is where the encoding-guessing magic happens...?

Am I missing something here? Shouldn't JSONDecoder need to know the encoding of the data it receives? I realize that the JSON standard requires this data to be UTF-8 encoded, but can JSONDecoder be making that assumption? I'm confused.

pseudosudo
  • 6,270
  • 9
  • 40
  • 53
  • Presumably, if none provided it will use a default value? In the link you provide, it states that "The default strategy is the JSONDecoder.DataDecodingStrategy.base64 strategy" - That would be your default decoding strategy. If this fails, it fails because the input is not using the same. In that case, you would need to specify which strategy you mean to use to handle the JSON payload – blurfus Mar 01 '19 at 00:44
  • "I realize that the JSON standard requires this data to be UTF-8 encoded, but can `JSONDecoder` be making that assumption?" I don't see why not. If it's not UTF-8, then it's not JSON. It's similar to JSON, but it's not JSON. – Alexander Mar 01 '19 at 00:49
  • dataDecodingStrategy has nothing to do with the way the string is encoded. It just let your choose how the data in your object will be converted from/to string utf8 (base64 is the default) to be able to send/receive it as JSON (which is always a string encoded as utf8 data even if you you send data inside your structure) – Leo Dabus Mar 01 '19 at 00:54
  • @Alexander: Older JSON standards allowed also UTF-16 and UTF-32, and JSONDecoder can handle that. – Martin R Mar 01 '19 at 02:23
  • I had problems with a JSON sent by server that is encoded in ASCII. Fixed it by constructing a string from the JSON using ASCII, then get the data using UTF-8. – Khanh Nguyen Jan 20 '22 at 23:52

1 Answers1

5

RFC 8259 (from 2017) requires that

JSON text exchanged between systems that are not part of a closed ecosystem MUST be encoded using UTF-8.

The older RFC 7159 (from 2013) and RFC 7158 (from 2013) only stated that

JSON text SHALL be encoded in UTF-8, UTF-16, or UTF-32. The default encoding is UTF-8, and JSON texts that are encoded in UTF-8 are interoperable in the sense that they will be read successfully by the maximum number of implementations; there are many implementations that cannot successfully read texts in other encodings (such as UTF-16 and UTF-32).

And RFC 4627 (from 2006, the oldest one that I could find):

JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

Since the first two characters of a JSON text will always be ASCII characters, it is possible to determine whether an octet stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking at the pattern of nulls in the first four octets.

JSONDecoder (which uses JSONSerialization under the hood) is able to decode UTF-8, UTF-16, and UTF-32, both little-endian and big-endian. Example:

let data = "[1, 2, 3]".data(using: .utf16LittleEndian)!
print(data as NSData) // <5b003100 2c002000 32002c00 20003300 5d00>

let a = try! JSONDecoder().decode([Int].self, from: data)
print(a) // [1, 2, 3]

Since a valid JSON text must start with "[", or "{", the encoding can unambiguously be determined from the first bytes of the data.

I did not find this documented though, and one probably should not rely on it. A future implementation of JSONDecoder might support only the newer standard and require UTF-8.

Community
  • 1
  • 1
Martin R
  • 529,903
  • 94
  • 1,240
  • 1,382
  • Though to be honest I was imagining the OP meant something like "why can't it be Latin-1 as long as we tell the decoder?" – matt Mar 01 '19 at 01:56