How to remove everything except emoji from a string?

Question

I need to somehow remove all characters except emoji from a string in Python.

Really needed an answer to this, made this myself. Hope someone needs it. This quesiton is a QnA and it requires no further context but you're free to add your answers.

Well, https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-data.txt (and https://stackoverflow.com/questions/30470079/emoji-value-range) — tevemadar, Aug 24 '23 at 10:00

Andj · Answer 1 · 2023-08-25T07:17:17.187

1

An alternative approach, that should support complex graphemes:

split text into graphemes.
keep graphemes that are emoji.
join list elements into a string.

a) a solution using the emoji and regex modules:

import emoji
import regex as re
text = "©️ 1️⃣  Hello, world!  from ‍‍‍ in  ☝"
graphemes = re.findall(r'\X', text)
result = "".join([grapheme for grapheme in graphemes if emoji.is_emoji(grapheme)])
print(result)
# ©️1️⃣‍‍‍☝

b) just using the regex module:

import regex as re
text = "©️ 1️⃣  Hello, world!  from ‍‍‍ in  ☝"
graphemes = re.findall(r'\X', text)
result = "".join([grapheme for grapheme in graphemes if re.match(r'^\p{Emoji}(\uFE0F\u20E3?|[\p{Emoji}\u200D])*$', grapheme)])
print(result)
# ©️1️⃣‍‍‍☝

edited Aug 25 '23 at 07:17

answered Aug 25 '23 at 06:54

Andj

481
3
8

The neatest solution – Maksiks Aug 25 '23 at 13:04
@Maksiks I wouldn't use solution a, and solution b needs additional logic to handle edge cases, and give a developer control how default text presentation characters, and characters with VS15 are processed. I think there are four states that need to be accounted for: 1) emoji with default text presentation, 2) emoji with VS15, 3) emoji with VS16, and 4) emoji with default emoji presentation. 3) and 4) should automatically count as emoji. And 1) and 2) should be optionally (and independently counted as emoji). If that makes sense. So my solution still needs a lot of work. – Andj Aug 26 '23 at 10:01

Mahesh Karia · Answer 2 · 2023-08-25T14:33:21.393

0

Solution 1: You can use emoji package to extract emojis as shown below.

import emoji
text = "©️ 1️⃣  Hello, world!  from ‍‍‍ in  ☝"
print("".join(_['emoji'] for _ in emoji.emoji_list(text)))

Output

©️1️⃣‍‍‍☝

Solution 2: You can use deomji package to extract emojis as shown below. (This solution doesn't maintain order of emojis for given text, you can use findall_list to maintain order with demoji)

import demoji
text = "©️ 1️⃣  Hello, world!  from ‍‍‍ in  ☝"
print(demoji.findall(text).keys())

Output

©️☝‍‍‍1️⃣

edited Aug 25 '23 at 14:33

answered Aug 25 '23 at 14:24

Mahesh Karia

2,045
1
12
23

For solution 1: given `text = "©©︎©️"` which are the codepoints `U+00A9 U+00A9 U+FE0E U+00A9 U+FE0F` your code returns `U+00A9 U+00A9 U+00A9 U+FE0F`. The question is whether "emoji" who's default presentation is text (like the copy right sign should be captured as emoji, and whether emoji followed by VS15 should be captured as emoji or not. Currently I'm leaning towards a function with a more nuanced interpretation of emoji, with flexibility built in. – Andj Aug 26 '23 at 09:50
Solution 2 returns `U+00A9 U+FE0F, U+00A9`. Both solution one and solution two behave differently. And yield different results. The difference may be related to demoji's use of dicts rather than its interpretation of what constitutes an emoji. – Andj Aug 26 '23 at 09:54

Maksiks · Answer 3 · 2023-08-29T08:17:05.703

Here's a working solution although it's not ideal.

It checks for English (can be changed) text and some other characters in a string and removes them.

There's also a check for if it only contains text. You can add other characters to r'[a-zA-Z0-9()_/., ]*$' to exclude them aswell.

import re

input_string = " test ♣️ ⚾️       test   ⤵️  ⛎"

m = re.compile(r'[a-zA-Z0-9()_/., ]*$')
if m.match(input_string):
    print("Uh oh, contains text only")
else:
    blankPrompt = []
    for i in input_string:
        print(i)
        doAppend = True
        if re.search(r'[^a-zA-Z0-9()_/., ]', i):
            print("safe")
        else:
            print("unsafe")
            doAppend = False
        if doAppend:
            blankPrompt.append(i)
    beautifiedPrompt = ''.join(str(p) for p in blankPrompt)
    print(beautifiedPrompt)

easier to use `regex` module instead if `re`, and test for emoji, rather than trying to detect non-emoji, especially since some of the characters you are stripping out can be emoji if VS16 is present. — Andj, Aug 25 '23 at 06:58

score -1 · Answer 4 · answered Aug 24 '23 at 10:29

-1

install the package emoji with pip then this code should work

import emoji

text = " Hello, world! "

for char in text:
    if emoji.is_emoji(char):
        text = text.replace(char, '')

print(text)

answered Aug 24 '23 at 10:29

AquaticHoney

1
1

2

Doesn't this do the opposite of what the question asks? – Andj Aug 25 '23 at 06:26

score -1 · Answer 5 · answered Aug 24 '23 at 10:43

-1

Make a list of characters then loop over the string and replace any character that's in the list to blank

import string
removeList = list(string.ascii_lowercase) + list(string.ascii_uppercase)
someStr = "wifjwifjifj"
for v in someStr:
    if v in removeList:
        someStr = someStr.replace(char, '')

print()

answered Aug 24 '23 at 10:43

Da Krabs Koder

1
3

Please double-check your example before posting the answer. In `.replace(char,'')` `char` is not defined yet, you are also not printing someStr – João Areias Aug 26 '23 at 18:03
Answer is wrong – João Areias Aug 26 '23 at 18:04

score -1 · Answer 6 · answered Aug 24 '23 at 16:06

-1

import emoji

text_with_emoji = " Hello, world! "

for char in text_with_emoji:
    if emoji.is_emoji(char):
        text = text.replace(char, '').strip()

print(text)

answered Aug 24 '23 at 16:06

Aman Raheja

615
7
16

Doesn't this do the opposite of what the question is asking? – Andj Aug 25 '23 at 06:24
It does the opposite – Maksiks Aug 25 '23 at 13:01

How to remove everything except emoji from a string?

6 Answers6