The problem is in the source data: the JSON sequence "\u00e2\u0080\u0099"
does not represent a right closing quotation mark. There are three Unicode code points here, and the first represent "â", while the other two are control characters.
You can verify this in a dev console, or by running the snippet below:
console.log(JSON.parse('"\u00e2\u0080\u0099"'));
Apparently the author of that JSON mixed up two things:
- JSON is encoded in UTF
- A
\u
notation represents a Unicode Code Point
The first means that the file or stream, encoding the JSON text into bytes, should be UTF encoded (preference for UTF8). The second has nothing to do with that. JSON syntax allows to specify 16-bit Unicode Code Points using the \u
syntax. It is not intended to produce a UTF8 byte sequence with a sequence1 of \u
encodings. One should not be concerned about the lower-level UTF8 byte stream encoding when defining JSON text.
1 I may need to at least mention the surrogate pairs, but they are really unrelated to UTF8, but more with how Unicode Code Points beyond the 16-bit range can be encoded in JSON.
So although the right closing quotation mark has an UTF8 sequence of E2 80 99, this is not to be encoded with a \u
notation for each of those three bytes.
The right closing quotation mark has Unicode Code Point \u2019
. So either the source JSON should have that, or it should just have the character ’ literally (which will indeed be a UTF8 sequence in the byte stream, but that is a level below JSON)
See those two possibilities:
console.log(JSON.parse('"’"'));
console.log(JSON.parse('"\u2019"'));
And now?
I would advise you to contact the service provider of this particular API. They have a bug in their JSON producing service.
Whatever you do, do not try to fix this in your client that is using this service, trying to recognise such malformed sequences, and replacing them as if those characters represented UTF8 bytes. Such a fix will be hard to maintain, and may even hit false positives.