I'm trying to create a way to split a string by emoji and non-emoji chunks. I managed to get a regex from here and altered to this to take into account the textual variation selector:
(?:(?!(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])+\ufe0e))(\u00a9|\u00ae|[\u2000-\u3300]|\ud83c[\ud000-\udfff]|\ud83d[\ud000-\udfff]|\ud83e[\ud000-\udfff])+
This works with .match
such as:
''.match(regex) // (["0x1F1E6", "0x1F1E8"]) => ['']
''.match(regex) // (["0x1F1E6", "0x1F1E8", "0xFE0E]) => null
But split
isn't giving me the expected results:
''.split(regex) // (["", undefined, "", ""]) => ['']
I need split
to return the entire emoji in one element. What am I doing wrong?
EDIT:
I have a working regex now, except for the edge case exhibited here: https://regex101.com/r/Vki2ZS/2.
I don't want the second emoji to be matched since it is succeeded by the textual variant selector. I think this is because I'm using lookahead, as the reverse string is matched as expected, but I can't use negative look behind since it's not supported by all browsers.