2

This is my initial string:

I used a not so elegant way to break up the emojis.

=if(len(I88) = 4, REGEXEXTRACT(I88,"(.+?)\s*(.+?)"),if(len(I88) = 6, REGEXEXTRACT(I88,"(.+?)\s*(.+?)\s*(.+?)"),if(len(I88) = 8, REGEXEXTRACT(I88,"(.+?)\s*(.+?)\s*(.+?)\s*(.+?)"),if(len(I88) = 10, REGEXEXTRACT(I88,"(.+?)\s*(.+?)\s*(.+?)\s*(.+?)\s*(.+?)"), REGEXEXTRACT(I88,"\s*(.+?)" )))))

The result is 4 columns instead of 3: this is what it looks like

  |  |   |     

I left the pipes to indicate a separate column

What I want is this:

 |  |  
Rubén
  • 34,714
  • 9
  • 70
  • 166
Ray Man
  • 65
  • 11
  • This is a problem in JS ES5 regex syntax. Instead of a `.`, you need to use `(?:[\0-\t\x0B\f\x0E-\u2027\u202A-\uD7FF\uE000-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])|(?:[^\uD800-\uDBFF]|^)[\uDC00-\uDFFF])`. Also, you cannot use `(.+?)` at the end of the pattern, it will match nothing. – Wiktor Stribiżew Aug 30 '16 at 10:09
  • @WiktorStribiżew: Actually the problem is related with the implementation of [RE2](https://github.com/google/re2) in Google Sheets but the solution requires the use of JavaScript. – Rubén Aug 30 '16 at 20:55

1 Answers1

-1

Short answer

To correctly separate the three emoticons we need to use a custom function. Fortunaly there are JavaScript libraries that could be used for this like the one shared in the answer by Orlin Giorgiev to Get grapheme character count in javascript strings?

Explanation

The OP formula is returning four elements instead of three because Google Sheets built-in functions requires four "characters" (actually they are code points) that need more than 4 hexadecimal digits to represent them. Each set of "characters" to represent emoticons are called "astral code points".

From https://mathiasbynens.be/notes/javascript-unicode

Astral code points are pretty easy to recognize: if you need more than 4 hexadecimal digits to represent the code point, it’s an astral code point.


Internally, JavaScript [as well Google Sheets built-in functions] represents astral symbols as surrogate pairs, and it exposes the separate surrogate halves as separate “characters”. If you represent the symbols using nothing but ECMAScript 5-compatible escape sequences, you’ll see that two escapes are needed for each astral symbol. This is confusing, because humans generally think in terms of Unicode symbols or graphemes instead.

Custom function

function SPLITGRAPHEMES(string) {
  var splitter = new GraphemeSplitter();
  return splitter.splitGraphemes(string); 
}

NOTE: Don't forget to include the referred JavaScript library

Syntax

Assume that A1 contains emoticons. To split the three emoticons in a 1 x 3 array use the following formula:

=TRANSPOSE(SPLITGRAPHEMES(A1))

Note: In Windows the emoticons () in this Q&A doesn't look the same as in Chrome OS, so a image was used in the above paragraph.

Community
  • 1
  • 1
Rubén
  • 34,714
  • 9
  • 70
  • 166