Let's say we have following strings containing emojis:
sent1 = ' right'
sent2 = 'Some text?! '
sent3 = ''
The task is to remove text and get the following output:
sent1_emojis = ' '
sent2_emojis = ' '
sent3_emojis = ''
Based on past question (Regex Emoji Unicode) I use the following regex to identify strings that contain at least one emoji:
emoji_pattern = re.compile(u".*(["
u"\U0001F600-\U0001F64F" # emoticons
u"\U0001F300-\U0001F5FF" # symbols & pictographs
u"\U0001F680-\U0001F6FF" # transport & map symbols
u"\U0001F1E0-\U0001F1FF" # flags (iOS)
"])+", flags= re.UNICODE)
In order to get the output string I use the following:
re.match(emoji_pattern, sent1).group(0)
and so on.
There's a problem with the sent2
string. re.match(emoji_pattern, sent1).group(0)
returns the whole sent2
instead of emojis only.