Using the 3rd party regex
module (pip install regex
) and Python 3.5:
>>> import regex
>>> s = '\U0001f680\U0001f618\U0001f44d\U0001f3fe\U0001f1e6\U0001f1ee'
>>> import unicodedata as ud
>>> ud.category(s[0])
'So'
>>> ud.category(s[1])
'So'
>>> ud.category(s[2])
'So'
>>> ud.category(s[3])
'Sk'
>>> ud.category(s[4])
'So'
>>> ud.category(s[5])
'So'
>>> regex.findall(r'\p{So}\p{Sk}*',s)
['\U0001f680', '\U0001f618', '\U0001f44d\U0001f3fe', '\U0001f1e6', '\U0001f1ee']
Edit:
The national flags are a two-letter regional indicator symbol from the range U+1F1E6 - U+1F1FF. It turns out regex
has a grapheme cluster \X
switch, but it finds the flags but not the skin tone marker.
>>> regex.findall(r'\X',s)
['\U0001f680', '\U0001f618', '\U0001f44d', '\U0001f3fe', '\U0001f1e6\U0001f1ee']
However, you could look for symbol modifiers OR grapheme clusters:
>>> regex.findall(r'.\p{Sk}+|\X',s)
['\U0001f680', '\U0001f618', '\U0001f44d\U0001f3fe', '\U0001f1e6\U0001f1ee']
There may be other exceptions.