0

I've got a string that is a mixture of text and emoji. I want to split the string into characters and deal with each character one at a time.

The issue here is that Emoji can be from 2 to 7 bytes long. Ex: the emoji's encoding could also include the skin tone modifier and sex modifier. When I've tried iterating through the characters as a character sequence, I end up splitting the emoji.

Is there any somewhat reliable way to split a string in such a way that the bytes that are part of a single emoji stay together?

Joel
  • 2,230
  • 1
  • 20
  • 28
  • Maybe this helps: https://stackoverflow.com/a/32872406/5734097 – D.Kastier Jan 21 '22 at 19:16
  • It's not just emojis that can be made up of more than one code point. And of course if you are simply iterating the UTF-16 code units (Kotlin Chars) of pure text, you'll be splitting individual code points that use more than 16 bits. Do you actually want to more generally iterate grapheme clusters? – Tenfour04 Jan 21 '22 at 19:45

0 Answers0