0

Getting the first character in a string is fairly straightforward.

const str = 'abc'
str[0] // 'a' 

However, when javascript sees a unicode string, it will return the first byte of a multi-byte unicode character.

const strUnicode = 'hi'
strUnicode[0] // '�'

Is it possible to return the first complete unicode character?

const strUnicode = 'hi'
f(strUnicode) // ''
Kiran Rao
  • 321
  • 2
  • 11
  • `String.fromCodePoint('hi'.codePointAt(0));` See https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/codePointAt – Teemu Jan 13 '21 at 06:12

1 Answers1

1

Issue is that symbols are 16-bit characters. So it takes 2 positions in a character array.

Idea:

  • Loop over string and validate if current character is a symbol or not.
  • If symbol, take character at i and i+1. Increment i to skip the processed character
  • If not, just pick one character

function getCharacters(str) {
  const parts = []
  for(let i = 0; i< str.length; i++) {
    if (str.charCodeAt( i ) > 255) {
      parts.push(str.substr(i, 2))
      i++
    } else {
      parts.push(str[i])
    }
  }
  return parts
}

const strUnicode = 'hi'
console.log( getCharacters(strUnicode) )
Rajesh
  • 24,354
  • 5
  • 48
  • 79
  • Not deleting the answer as this approach is not there in linked post. Also yes, mentioned method is better to use but I'll keep it as an alternative – Rajesh Jan 13 '21 at 06:17
  • `"‍‍".length === 8` - there are characters with greater than 2 positions – Leland Feb 14 '23 at 17:19