2

I have problem extracting emoji from a series. The code used:

import emoji
def extract_emojis(text):
  return ''.join(c for c in text if c in emoji.UNICODE_EMOJI)

for text in df['comments']:
    df['emoji']=extract_emojis(text)

Output:

             comments                                    | emoji
0     Its very beautiful    
1   Your new bike, @keir ...?   
2   @philip     
3   Any news on the Canadian expansion mentioned i...   
4   Rocky Mountain ❤️   
... ... ...

Checking the function on just a text:

text = '@philip '
extract_emojis(text)
--> '\U0001f929\U0001f929'        

Expected result:

             comments                                    | emoji
0     Its very beautiful                                 |
1   Your new bike, @keir ...?                            |
2   @philip                                          | 
3   Any news on the Canadian expansion mentioned i...    |
4   Rocky Mountain ❤️                                    | ❤️ 
... ... ...

Note: I have only asked this question after looking at these links:
Python unicode character conversion for Emoji
How to extract all the emojis from text?

Luc
  • 737
  • 1
  • 9
  • 22

1 Answers1

1

Rather than iterating over the entire dataset. You can apply the function using apply or lambda.

import pandas as pd 
import emoji
df = pd.DataFrame([['@philip  '],
['Rocky Mountain ❤️']],columns = ['comments'])

Using Lambda:

df['emojis'] = df['comments'].apply(lambda row: ''.join(c for c in row if c in emoji.UNICODE_EMOJI))
df

using Apply

def extract_emojis(text):
    return ''.join(c for c in text if c in emoji.UNICODE_EMOJI)

df['emoji_apply'] = df['comments'].apply(extract_emojis)
df

Output:

comments    emojis
@philip     
Rocky Mountain ❤️   ❤
Equinox
  • 6,483
  • 3
  • 23
  • 32
  • thanks a lot, but just to increase my understanding, could you tell me why the for loop return empty results? – Luc Sep 06 '20 at 09:54
  • @Luc When you do `df['column']` = `some_value` it sets some_value to the entire column. In your case you will see that last executed statement's result is populated in the df. – Equinox Sep 06 '20 at 10:08
  • 1
    Spend 2 hours trying different solutions with many votes. None worked. Only this. Thanks!! – Simone Jul 25 '23 at 09:35