Why does parsing these code points result in multiple characters

Question

When I parse this:

JSON.parse('"\\u1f469\\u200d\\u1f469\\u200d\\u1f466"')

I end up with multiple characters:

὆9‍὆9‍὆6

But when I parse this:

JSON.parse('"\\uD83D\\uDC69\\u200D\\u2764\\uFE0F\\u200D\\uD83D\\uDC69"')

It produces ‍❤️‍. Both are running on chrome. The first one is a valid zero width join emoji. Why is the first one not producing the combined emoji characters?

This is a known limitation in JavaScript, which does not support multi-byte characters with more than 2 bytes: those need to be encoded with a surrogate pair. — trincot, Aug 27 '18 at 11:17
If it's a limitation in Javascript, how is this web page able to display the character correctly: https://emojipedia.org/family-woman-woman-boy/ — Johann, Aug 27 '18 at 11:24
Where do you see that that website does it via JS? All I see is that the HTML document is UTF encoded and has the character hard-coded literally in the document without any JS involvement. The bigger images on that page are not characters, but images. — trincot, Aug 27 '18 at 11:53
NB: your question has little to do with `JSON.parse`, as `JSON.parse('"\\uD83D\\uDC69"')` is the same string as `"\uD83D\uDC69"`. — trincot, Aug 27 '18 at 12:06

score 0 · Answer 1 · answered Aug 27 '18 at 11:16

The emoji is created from the 4 character after the 'U'. In the example you ask about, you have 5 character, so the emoji is created from the first 4 (1f46), and the 5th (9,9,6) is treated as a normal character.

JSON.parse('"\\u1f469"') // => \\u1f46 = ὆, 9 = 9

Why does parsing these code points result in multiple characters

1 Answers1