1

I am trying to extract emojis from a sentence and add them into a new column, but when I do it, the new column just contains nothing, in which the emojis are still in the sentence.

For reference, my dataset looks like this - but contains over 70,000 sentences similar to this:

Sentence
You look good
Love you ❤️
I am so happy today

So far, I have tried this method:

import pandas as pd 
import emoji

df['emojis'] = df['Sentence'].apply(lambda row: ''.join(c for c in row if c in emoji.UNICODE_EMOJI))
df

And this method:

def extract_emojis(text):
    return ''.join(c for c in text if c in emoji.UNICODE_EMOJI)

df['emojis'] = df['Sentence'].apply(extract_emojis)
df

However, when I try them, my final output seems to be this:

Sentence Emojis
You look good
Love you ❤️
I am so happy today

Hence, I want my output to look like this:

Sentence Emojis
You look good
Love you ❤️
I am so happy today

As well as that, I have also tried this method, which is exactly what I want to do:

import pandas as pd
import emoji as emj

def extract_emoji(df):
    df["emoji"] = ""
    for index, row in df.iterrows():
        for emoji in EMOJIS:
            if emoji in row["Sentence"]:
                row["Sentence"] = row["Sentence"].replace(emoji, "")
                row["emoji"] += emoji

extract_emoji(df)
print(df.to_string())

Though, with the method above, the code does not seem to fully execute, and I think it cannot handle so many rows in the dataset; hence, I have over 70,000 sentences, which need the emojis extracting.

As you can see, I am nearly there, but not fully.

These three methods have not fully worked for me, and I require some additional help.

In summary, I just want to extract the emojis from each sentence and add them into a new column - if this is possible.

Thank you very much.

1 Answers1

0

Try:

import re
import emoji

pattern = re.compile(r"|".join(map(re.escape, emoji.UNICODE_EMOJI["en"])))

df["Emojis"] = df["Sentence"].apply(lambda x: "".join(pattern.findall(x)))
df["Sentence"] = df["Sentence"].apply(lambda x: pattern.sub("", x))
print(df)

Prints:

               Sentence  Emojis
0        You look  good      
1              Love you      ❤️
2   I am so happy today      
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • Hi - thanks for the reply. First of all, should ```import emojis``` be ```import emoji```? Also, when I execute the code, the code does work, but it removes the emoji from all the sentences and adds a blank column with no emojis at all. It is doing exactly what the previous methods were doing in my post. –  Jun 26 '21 at 08:50
  • @AnandP2812 It's `emoji`, sorry for typo ( https://pypi.org/project/emoji/ ). Did you create a pattern using `emoji.UNICODE_EMOJI["en"]` ? – Andrej Kesely Jun 26 '21 at 09:10
  • Hi, no problem with the type. By creating a pattern, are you referring to this code? ```pattern = re.compile(r"|".join(map(re.escape, emoji.UNICODE_EMOJI["en"])))``` –  Jun 26 '21 at 10:43
  • @AnandP2812 yes, this is the pattern. – Andrej Kesely Jun 26 '21 at 10:44
  • 1
    Hello again - thanks very much. I separated that line of code, and it has successfully worked. Thanks again! –  Jun 26 '21 at 17:39