Why is '\u{1D11E}'.charAt(0) not equal to '\u{1D11E}'?

Question

When I'm trying to evaluate this expression in console I have false as result, why?

console.log('\u{1D11E}'.charAt(0) === '\u{1D11E}')

Please fix your question. That's just a unformatted line of invalid syntax. — Andreas, Jul 22 '20 at 07:08
Because `charAt()` originally was designed to support BMP only, and the character you gave does not belong to BMP. Check [this section](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charAt#Getting_whole_characters) for details (and a remedy). — raina77ow, Jul 22 '20 at 07:11
Similar question here: https://stackoverflow.com/q/46157867/3650856. You comapre an unicode character on right side with non-unicode character on left side. charAt(0) will break down unicode character as it is longer byte wise. Therefore they will not be same value, only same type. — KayaNatsumi, Jul 22 '20 at 07:16
@raina77ow another option is use `Array.from` on the string, and index into this. Not sure why MDN didn't mention that. — Keith, Jul 22 '20 at 07:19
Also more information here: https://mathiasbynens.be/notes/javascript-unicode — KayaNatsumi, Jul 22 '20 at 07:20

score 4 · Accepted Answer · edited Jul 22 '20 at 08:45

4

A simple console.log would show you the problem

console.log('\u{1D11E}'.charAt(0))
console.log('\u{1D11E}')
console.log('\u{1D11E}'.charAt(0) === '\u{1D11E}')

As you can see they don't give the same result, that's because charAt only handles UTF-16 code units. See code snippet on same source on how to handle UTF-16 characters (also on other planes, so with code point > 65535).

edited Jul 22 '20 at 08:45

Giacomo Catenazzi

8,519
2
24
32

answered Jul 22 '20 at 07:10

Alexandre Elshobokshy

10,720
6
27
57

Hi, but when '\u{1D11E}'.chartAt(0) === '\uD834' and the outcome is true – Json Prime Jul 22 '20 at 07:14
@JsonPrime same idea, console.log each one of them and you'll see why. They both return � – Alexandre Elshobokshy Jul 22 '20 at 07:20
Ok , since they both return � , so result of '\u{1D11E}'.chartAt(0) === '\uD834' is true , am i right ? – Json Prime Jul 22 '20 at 07:24
@JsonPrime it's `charAt` not `chartAt`, and yes – Alexandre Elshobokshy Jul 22 '20 at 07:27
1

The explanation is wrong: `'\u{1D11E}` is a UTF-16 character (or better a UTF-16 code point). MDN uses **code unit** (which it is a seldom used notation in Unicode). In this case, the `charAt(0)` gives a code unit, which it is a surrogate, so not a valid UTF-16 code point (or character). – Giacomo Catenazzi Jul 22 '20 at 07:46
@GiacomoCatenazzi feel free to edit my answer. Thanks for the feedback! – Alexandre Elshobokshy Jul 22 '20 at 07:49
@GiacomoCatenazzi Hi , what Im get from your feedback is `\u{1D11E}` is a UTF16 character which `charAt(0)` can handle but the problem is `\u{1D11E}` does not have a valid code point ? – Json Prime Jul 22 '20 at 08:46
@JsonPrime: no/maybe. `\u{1D11E}` is a valid code point, `charAt` doesn't necessarily return valid code points (just code unit, which may be half of code point, for code points > `0xFFFF`). If you need to get the valid code point, you should check the example/fix in the linked MDN page. – Giacomo Catenazzi Jul 22 '20 at 08:50

score 2 · Answer 2 · answered Jul 23 '20 at 00:25

'\u{1D11E}' is a string consisting of a single Unicode codepoint U+1D11E. Strings are encoded in UTF-16 format. So each char in the string is a UTF-16 code unit. Thus charAt() returns a code unit, not a codepoint.

U+1D11E is encoded in UTF-16 as 0xD834 0xDD1E, so the string '\u{1D11E}' is actually '\uD834\uDD1E', thus:

'\u{1D11E}'.charAt(0) === '\u{1D11E}' // false
// aka: '\uD834' === '\u{1D11E}'

and

'\u{1D11E}'.charAt(0) === '\uD834' // true
// aka: '\uD834' === '\uD834'

Why is '\u{1D11E}'.charAt(0) not equal to '\u{1D11E}'?

2 Answers2