0

I need to somehow remove all characters except emoji from a string in Python.

Really needed an answer to this, made this myself. Hope someone needs it. This quesiton is a QnA and it requires no further context but you're free to add your answers.

Maksiks
  • 94
  • 11
  • 1
    Well, https://www.unicode.org/Public/UCD/latest/ucd/emoji/emoji-data.txt (and https://stackoverflow.com/questions/30470079/emoji-value-range) – tevemadar Aug 24 '23 at 10:00

6 Answers6

1

An alternative approach, that should support complex graphemes:

  1. split text into graphemes.
  2. keep graphemes that are emoji.
  3. join list elements into a string.

a) a solution using the emoji and regex modules:

import emoji
import regex as re
text = "©️ 1️⃣  Hello, world!  from ‍‍‍ in  ☝"
graphemes = re.findall(r'\X', text)
result = "".join([grapheme for grapheme in graphemes if emoji.is_emoji(grapheme)])
print(result)
# ©️1️⃣‍‍‍☝

b) just using the regex module:

import regex as re
text = "©️ 1️⃣  Hello, world!  from ‍‍‍ in  ☝"
graphemes = re.findall(r'\X', text)
result = "".join([grapheme for grapheme in graphemes if re.match(r'^\p{Emoji}(\uFE0F\u20E3?|[\p{Emoji}\u200D])*$', grapheme)])
print(result)
# ©️1️⃣‍‍‍☝
Andj
  • 481
  • 3
  • 8
  • The neatest solution – Maksiks Aug 25 '23 at 13:04
  • @Maksiks I wouldn't use solution a, and solution b needs additional logic to handle edge cases, and give a developer control how default text presentation characters, and characters with VS15 are processed. I think there are four states that need to be accounted for: 1) emoji with default text presentation, 2) emoji with VS15, 3) emoji with VS16, and 4) emoji with default emoji presentation. 3) and 4) should automatically count as emoji. And 1) and 2) should be optionally (and independently counted as emoji). If that makes sense. So my solution still needs a lot of work. – Andj Aug 26 '23 at 10:01
0

Solution 1: You can use emoji package to extract emojis as shown below.

import emoji
text = "©️ 1️⃣  Hello, world!  from ‍‍‍ in  ☝"
print("".join(_['emoji'] for _ in emoji.emoji_list(text)))

Output

©️1️⃣‍‍‍☝

Solution 2: You can use deomji package to extract emojis as shown below. (This solution doesn't maintain order of emojis for given text, you can use findall_list to maintain order with demoji)

import demoji
text = "©️ 1️⃣  Hello, world!  from ‍‍‍ in  ☝"
print(demoji.findall(text).keys())

Output

©️☝‍‍‍1️⃣
Mahesh Karia
  • 2,045
  • 1
  • 12
  • 23
  • For solution 1: given `text = "©©︎©️"` which are the codepoints `U+00A9 U+00A9 U+FE0E U+00A9 U+FE0F` your code returns `U+00A9 U+00A9 U+00A9 U+FE0F`. The question is whether "emoji" who's default presentation is text (like the copy right sign should be captured as emoji, and whether emoji followed by VS15 should be captured as emoji or not. Currently I'm leaning towards a function with a more nuanced interpretation of emoji, with flexibility built in. – Andj Aug 26 '23 at 09:50
  • Solution 2 returns `U+00A9 U+FE0F, U+00A9`. Both solution one and solution two behave differently. And yield different results. The difference may be related to demoji's use of dicts rather than its interpretation of what constitutes an emoji. – Andj Aug 26 '23 at 09:54
-1

Here's a working solution although it's not ideal.

It checks for English (can be changed) text and some other characters in a string and removes them.

There's also a check for if it only contains text. You can add other characters to r'[a-zA-Z0-9()_/., ]*$' to exclude them aswell.

import re

input_string = " test ♣️ ⚾️       test   ⤵️  ⛎"

m = re.compile(r'[a-zA-Z0-9()_/., ]*$')
if m.match(input_string):
    print("Uh oh, contains text only")
else:
    blankPrompt = []
    for i in input_string:
        print(i)
        doAppend = True
        if re.search(r'[^a-zA-Z0-9()_/., ]', i):
            print("safe")
        else:
            print("unsafe")
            doAppend = False
        if doAppend:
            blankPrompt.append(i)
    beautifiedPrompt = ''.join(str(p) for p in blankPrompt)
    print(beautifiedPrompt)
Maksiks
  • 94
  • 11
  • 1
    easier to use `regex` module instead if `re`, and test for emoji, rather than trying to detect non-emoji, especially since some of the characters you are stripping out can be emoji if VS16 is present. – Andj Aug 25 '23 at 06:58
-1

install the package emoji with pip then this code should work

import emoji

text = " Hello, world! "

for char in text:
    if emoji.is_emoji(char):
        text = text.replace(char, '')

print(text)
-1

Make a list of characters then loop over the string and replace any character that's in the list to blank

import string
removeList = list(string.ascii_lowercase) + list(string.ascii_uppercase)
someStr = "wifjwifjifj"
for v in someStr:
    if v in removeList:
        someStr = someStr.replace(char, '')

print()
-1
import emoji

text_with_emoji = " Hello, world! "

for char in text_with_emoji:
    if emoji.is_emoji(char):
        text = text.replace(char, '').strip()

print(text)
Aman Raheja
  • 615
  • 7
  • 16