0

Right now I have a client string consists an emoji "📲", which means Mobile Phone With Arrow. I want to remove it in my text pre-process step so that I can pass it to my NLP model. I tried to use:

    def remove_emojis(text: str) -> str:
    emojis = re.compile("["
                        u"\U0001F600-\U0001F64F"  # emoticons
                        u"\U0001F300-\U0001F5FF"  # symbols & pictographs
                        u"\U0001F680-\U0001F6FF"  # transport & map symbols
                        u"\U0001F1E0-\U0001F1FF"  # flags (iOS)
                        u"\U00002500-\U00002BEF"  # chinese char
                        u"\U00002702-\U000027B0"
                        u"\U00002702-\U000027B0"
                        u"\U000024C2-\U0001F251"
                        u"\U0001f926-\U0001f937"
                        u"\U00010000-\U0010ffff"
                        u"\u2640-\u2642"
                        u"\u2600-\u2B55"
                        u"\u200d"
                        u"\u23cf"
                        u"\u23e9"
                        u"\u231a"
                        u"\ufe0f"  # dingbats
                        u"\u3030"  # flags (iOS)
                        "]+", flags=re.UNICODE)
    return emojis.sub(r'', text)

But it is not working. I think it does not cover this emoji. Is there any way I can remove emojis like this one?

PS: Keeping only English characters does not work for my case cause the client string is not in English.

Michael
  • 439
  • 1
  • 4
  • 15
  • Did you try finding unicode for this emojis. If there are no unicode available you can create your own list of emojis - Here they're http://forum.pianoworld.com/ubbthreads.php/topics/2904538/1/ot-list-of-all-emojis-for-people-who-need-hands-to-talk.html – dheeraj Mar 28 '22 at 12:58
  • I will definitely try, thank you! – Michael Mar 28 '22 at 13:07
  • Related: [Find out if Character in String is emoji?](https://stackoverflow.com/questions/30757193/find-out-if-character-in-string-is-emoji). The accepted answer contains a lot of documentation, along with very short code for a few functions to test whether a string contains emoji, contains only emoji, contains a single emoji, etc. The code is in Swift but it is very short and very simple, shouldn't be too hard to translate to python. – Stef Mar 28 '22 at 13:15
  • the regex on this site seems to work well: https://www.regextester.com/112392 – nicofrlo Mar 28 '22 at 13:23
  • Actually those might be more helpful than the other question I linked: [Find there is an emoji in a string in python3?](https://stackoverflow.com/questions/36216665/find-there-is-an-emoji-in-a-string-in-python3) and [How to extract all the emojis from text?](https://stackoverflow.com/questions/43146528/how-to-extract-all-the-emojis-from-text) – Stef Mar 28 '22 at 13:55
  • Perhaps less useful but related: [remove unicode emoji using re in python?](https://stackoverflow.com/questions/26568722/remove-unicode-emoji-using-re-in-python) and [Correctly extract Emojis from a Unicode string?](https://stackoverflow.com/questions/35404144/correctly-extract-emojis-from-a-unicode-string) – Stef Mar 28 '22 at 13:57
  • Check [this solution](https://stackoverflow.com/a/69423881/8704180) out :) – meti Mar 30 '22 at 06:23

0 Answers0