I recently had a problem where I needed to extract all Emojis in a string to count the occurrence of specific Emojis. The Emoji python package let me extract all Emojis, but I always got specific modifiers such as Skin tones extracted as separate Emojis. I wanted to ignore Skin tones and other Fitzpatrick modifiers Variant Selectors (see this page for types and background on Fitzpatrick from Wikpedia). The following code will result in Fitzpatrick modifiers selected as separate emojis (which is not what I need):
import emoji
def extract_emojis(str):
return list(c for c in str if c in emoji.UNICODE_EMOJI)
Example: this emoji ❤️
is actually composed of two parts, a heart (Unicode Codepoint: U+2764
) and a modifier for red (Unicode Codepoint: U+fe0f
). print(repr('❤️'))
results in: \u2764\ufe0f - two separate unicodes but only one emoji. The second code point alone does not make sense on its own, yet it is returned as a separate emoji in the list from return list(c for c in str if c in emoji.UNICODE_EMOJI)
.