Why '❌'[0] === '❌' but '✔️'[0] !== '✔️'?

Question

'❌'[0] === '❌' // true
'✔️'[0] === '✔️' // false
'✔️'[0] === '✔'  // true

I suspect it's unicode related but would like to understand precisely what is happening and how can I correctly compare such charaters. Why is '✔️' treated differently than '❌'?

I encountered it in this simple char counting

'✔️❌✔️❌'.split('').filter(e => e === '❌').length // 2
'✔️❌✔️❌'.split('').filter(e => e === '✔️').length // 0

I think the check-mark is a two-character sequence, while the "X" is not. — Pointy, Oct 12 '21 at 23:55
You encountered a surrogate pair: https://stackoverflow.com/questions/31986614/what-is-a-surrogate-pair — Adelin, Oct 13 '21 at 00:01
thorough explication here: [Emojis in Javascript](https://medium.com/reactnative/emojis-in-javascript-f693d0eb79fb) from this question: [How to convert one emoji character to Unicode codepoint number in JavaScript?](https://stackoverflow.com/questions/48419167/how-to-convert-one-emoji-character-to-unicode-codepoint-number-in-javascript) — pilchard, Oct 13 '21 at 00:12
It’s not surrogate pairs; this is a _grapheme cluster_ made out of the `U+2714 HEAVY CHECK MARK` and the `U+FE0F VARIATION SELECTOR-16`. — Sebastian Simon, Oct 13 '21 at 00:15

score 5 · Accepted Answer · answered Oct 12 '21 at 23:55

5

Because ✔️ takes two characters: "✔️".length === 2

"✔️"[0] === "✔" an "✔️"[1] denotes color, I think.

And "❌".length === 1 so it take only one character.

It's similar to the way emojis with different skin colors work as well.

As to how to compare, I think that "✔️".codePointAt(0) (not to confuse with charCodeAt()) might help. See https://thekevinscott.com/emojis-in-javascript/:

codePointAt and fromCodePoint are new methods introduced in ES2015 that can handle unicode characters whose UTF-16 encoding is greater than 16 bits, which includes emojis. Use these instead of charCodeAt, which doesn’t handle emoji correctly.

answered Oct 12 '21 at 23:55

Maxim Mazurok

3,856
2
22
37

1

Thanks for the explanation. Just checked that '‍‍'.length === 8. Ugh. That's enough programming for today :D – Wilhelm Olejnik Oct 13 '21 at 00:11
@WilhelmOlejnik ‍‍ is 8 UTF-16 codeunits in length: `0xD83D 0xDC68 0x200D 0xD83D 0xDC69 0x200D 0xD83D 0xDC66`, which when translated to Unicode is 5 codepoints: `U+1F468 MAN` + `U+200D ZERO WIDTH JOINER` + `U+1F469 WOMAN` + `U+200D ZERO WIDTH JOINER` + `U+1F466 BOY`, aka `FAMILY: MAN, WOMAN, BOY` – Remy Lebeau Oct 15 '21 at 00:40

score 3 · Answer 2 · answered Oct 12 '21 at 23:56

I believe the '✔️' is made up of 2 components. When you output '✔️'[0] you get '✔', and the black checkmark does not equal the green checkmark.

However, the '❌' is made up of just a single component, so when you output '❌'[0], you get the same thing: '❌'.

score 3 · Answer 3 · answered Oct 13 '21 at 00:50

The second char '✔️'[1](code point = 65039) is a Variation Selector

A Variation Selector specifies that the preceding character should be displayed with emoji presentation. Only required if the preceding character defaults to text presentation.

Often used in Emoji ZWJ Sequences, where one or more characters in the sequence have text and emoji presentation, but otherwise default to text (black and white) display.

Examples Snowman as text: ☃. Snowman as Emoji: ☃️

Black Heart as text: ❤. Black Heart as Emoji: ❤️ (not so black)

Variation Selector-16 was approved as part of Unicode 3.2 in 2002.

https://unicode-table.com/en/FE0F/

Why '❌'[0] === '❌' but '✔️'[0] !== '✔️'?

3 Answers3