3

When I'm trying to evaluate this expression in console I have false as result, why?

console.log('\u{1D11E}'.charAt(0) === '\u{1D11E}')
Alexandre Elshobokshy
  • 10,720
  • 6
  • 27
  • 57
Json Prime
  • 180
  • 1
  • 10
  • Please fix your question. That's just a unformatted line of invalid syntax. – Andreas Jul 22 '20 at 07:08
  • Because charAt can only handle UTF-16, – Keith Jul 22 '20 at 07:08
  • Because `charAt()` originally was designed to support BMP only, and the character you gave does not belong to BMP. Check [this section](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/charAt#Getting_whole_characters) for details (and a remedy). – raina77ow Jul 22 '20 at 07:11
  • Similar question here: https://stackoverflow.com/q/46157867/3650856. You comapre an unicode character on right side with non-unicode character on left side. charAt(0) will break down unicode character as it is longer byte wise. Therefore they will not be same value, only same type. – KayaNatsumi Jul 22 '20 at 07:16
  • 1
    @raina77ow another option is use `Array.from` on the string, and index into this. Not sure why MDN didn't mention that. – Keith Jul 22 '20 at 07:19
  • Also more information here: https://mathiasbynens.be/notes/javascript-unicode – KayaNatsumi Jul 22 '20 at 07:20

2 Answers2

4

A simple console.log would show you the problem

console.log('\u{1D11E}'.charAt(0))
console.log('\u{1D11E}')
console.log('\u{1D11E}'.charAt(0) === '\u{1D11E}')

As you can see they don't give the same result, that's because charAt only handles UTF-16 code units. See code snippet on same source on how to handle UTF-16 characters (also on other planes, so with code point > 65535).

Giacomo Catenazzi
  • 8,519
  • 2
  • 24
  • 32
Alexandre Elshobokshy
  • 10,720
  • 6
  • 27
  • 57
  • Hi, but when '\u{1D11E}'.chartAt(0) === '\uD834' and the outcome is true – Json Prime Jul 22 '20 at 07:14
  • @JsonPrime same idea, console.log each one of them and you'll see why. They both return � – Alexandre Elshobokshy Jul 22 '20 at 07:20
  • Ok , since they both return � , so result of '\u{1D11E}'.chartAt(0) === '\uD834' is true , am i right ? – Json Prime Jul 22 '20 at 07:24
  • @JsonPrime it's `charAt` not `chartAt`, and yes – Alexandre Elshobokshy Jul 22 '20 at 07:27
  • 1
    The explanation is wrong: `'\u{1D11E}` is a UTF-16 character (or better a UTF-16 code point). MDN uses **code unit** (which it is a seldom used notation in Unicode). In this case, the `charAt(0)` gives a code unit, which it is a surrogate, so not a valid UTF-16 code point (or character). – Giacomo Catenazzi Jul 22 '20 at 07:46
  • @GiacomoCatenazzi feel free to edit my answer. Thanks for the feedback! – Alexandre Elshobokshy Jul 22 '20 at 07:49
  • @GiacomoCatenazzi Hi , what Im get from your feedback is `\u{1D11E}` is a UTF16 character which `charAt(0)` can handle but the problem is `\u{1D11E}` does not have a valid code point ? – Json Prime Jul 22 '20 at 08:46
  • @JsonPrime: no/maybe. `\u{1D11E}` is a valid code point, `charAt` doesn't necessarily return valid code points (just code unit, which may be half of code point, for code points > `0xFFFF`). If you need to get the valid code point, you should check the example/fix in the linked MDN page. – Giacomo Catenazzi Jul 22 '20 at 08:50
2

'\u{1D11E}' is a string consisting of a single Unicode codepoint U+1D11E. Strings are encoded in UTF-16 format. So each char in the string is a UTF-16 code unit. Thus charAt() returns a code unit, not a codepoint.

U+1D11E is encoded in UTF-16 as 0xD834 0xDD1E, so the string '\u{1D11E}' is actually '\uD834\uDD1E', thus:

'\u{1D11E}'.charAt(0) === '\u{1D11E}' // false
// aka: '\uD834' === '\u{1D11E}'

and

'\u{1D11E}'.charAt(0) === '\uD834' // true
// aka: '\uD834' === '\uD834'
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770