0

In ES6, when we use codePointAt(0) on a string with one character in it ('') that has a Unicode code point value larger than U+FFFF (therefore not part of the Basic Multilingual Plane), we get the code point 134071. The string still actually has two code points in it, that represent this 134071 value.

> (55362).toString(16)
'd842'
> (57271).toString(16)
'dfb7'
> "\ud842\udfb7"
''
> const j = "\ud842\udfb7"
undefined
> j
''
> j.codePointAt(0)
134071
> j.codePointAt(1)
57271
>

My question is how do we go from the two code points 55362 and 57271 to the single code point 134071. I am talking about the mathematical relationship here.

Also, why can we still get access to the code point at position 1, but we can't get access to the individual code point at position 0?

evianpring
  • 3,316
  • 1
  • 25
  • 54
  • @gman this question is not answered by the question you linked. You closed this question mistakenly. – evianpring Dec 04 '19 at 07:13
  • this is the duplicate: https://stackoverflow.com/questions/8868432/how-are-surrogate-pairs-calculated – evianpring Dec 04 '19 at 07:17
  • 2
    explanation of the UTF-16 algorithm with an example, in both directions: https://stackoverflow.com/a/58215052/46395 – daxim Dec 04 '19 at 11:05
  • 1
    @evianpring You are getting your terminology wrong. A string contains [UTF-16](https://en.wikipedia.org/wiki/UTF-16) *codeunits*, not Unicode *codepoints*. Codepoints outside the BMP are represented as *surrogate pairs*. The string's `codePointAt()` method looks at the codeunit at the given index, and if it begins a surrogate pair then the whole pair is decoded, otherwise the codeunit is returned as-is. This is documented behavior – Remy Lebeau Dec 04 '19 at 18:24

0 Answers0