0

My research interest is the effect of emojis in text for sentiment analysis. I would like to extract all the emojis from my dataset. So far I have done the following:

 import re 
 from emoji import UNICODE_EMOJI


 emoji_1 = re.compile('[\\u203C-\\u3299\\U0001F000-\\U0001F644]')

 emoji_list= list(filter(emoji_1.match, df['Tweet text']))

 emo_found= ' '.join(emoji for emoji in emoji_list)

  

 def get_emoji_set(text):
     return {letter for letter in text if letter in UNICODE_EMOJI['en'] }

 c = get_emoji_set(emo_found)

  Print(c)

But it is not extracting all the images. So far I got only the following emojis using the above code:

{'', '', '', '', '', '', '', ''}

However these are only the partial emojis that are present in the dataset. There are also the following emojis present in my dataset which am not getting in the result:

, , , , ,,,, + more emojis

Why is my code not extracting all the emojis from my dataset, is there any emojis left as i defined in emoji_1? Is there any more ranges that i should compile using regex ?

I have tried the following answer, but it does not return anything. I get an empty column.

Extract emoji from series of text

CD_NS
  • 309
  • 1
  • 5
  • 14
  • [this](https://stackoverflow.com/questions/63762570/extract-emoji-from-series-of-text) worked beautifully for me. Checked other solutions with many votes but they were either using deprecated functions or else did not work – Simone Jul 25 '23 at 09:36

1 Answers1

1

Something like the demoji library might help.

Accurately find or remove emojis from a blob of text using data from the Unicode Consortium's emoji code repository.

E.Eldridge
  • 103
  • 8
  • I am trying to do it with the following code: `df["Emoji list"] = demoji.findall(df['Tweet text'].map(str))` but getting error 'expected string or bytes like object' – CD_NS Oct 02 '21 at 10:04