I am working in Python with pandas and I have a data frame in which one of its columns contain phrases that include emojis, such as "when life gives you s, make lemonade" or "Catch a falling ⭐️ and put it in your pocket". Not all the phrases have emojis and if they do, it could be anywhere in the phrase (not just the beginning or end). I want to go through each text, and essentially count the frequencies for each of the emojis that appear, the emojis that appear the most, etc. I am not sure how to actually process/recognize the emojis. If I go through each of the texts in the column, how would I go about identifying the emoji so I can gather the desire information such as counts, max, etc.
Asked
Active
Viewed 4,731 times
3
-
3Possible duplicate of [How to find and count emoticons in a string using python?](http://stackoverflow.com/questions/19149186/how-to-find-and-count-emoticons-in-a-string-using-python) – hashcode55 Feb 25 '17 at 09:40
-
The solutions posted there doesn't work for me. If you're familiar with this, would you be willing to help? – Jane Sully Feb 25 '17 at 18:15
-
Yeah sure! I think the solutions are not working for you because the emoticons you have in your phrases are outside the range of unicode they have taken in the answers... Try re-adjusting the range and it should work. – hashcode55 Feb 25 '17 at 19:08
-
Okay! That makes complete sense. Do you know how to find suitable ranges. I still have some other emojis that aren't being recognized and am not sure how to appropriately increase the range? I appreciate your help! – Jane Sully Feb 25 '17 at 21:05
-
Yeah, I was just fixing that :) I'll edit the answer. – hashcode55 Feb 25 '17 at 21:09
-
Great! Last question, I promise. If you don't mind me asking, how did you determine the range? I would like to be able to come up with that myself, but I am not really sure how to. Thanks again :) – Jane Sully Feb 25 '17 at 21:23
-
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/136646/discussion-between-hashcode55-and-jane-sully). – hashcode55 Feb 25 '17 at 21:24
1 Answers
3
Suppose you have a dataframe like this
import pandas as pd
from collections import defaultdict
df = pd.DataFrame({'phrases' : ["Smiley emoticon rocks! I like you.\U0001f601",
"Catch a falling ⭐️ and put it in your pocket"]})
which yields
phrases
0 Smiley emoticon rocks! I like you.
1 Catch a falling ⭐️ and put it in your pocket
You can do something like :
# Dictionary storing emoji counts
emoji_count = defaultdict(int)
for i in df['phrases']:
for emoji in re.findall(u'[\U0001f300-\U0001f650]|[\u2000-\u3000]', i):
emoji_count[emoji] += 1
print (emoji_count)
Note that I have changed the range in re.findall(u'[\U0001f300-\U0001f650]|[\u2000-\u3000', i)
.
The alternate part is to handle different unicode group, but you should get the idea.
In Python 2.x you can convert the emoji to unicode using
unicode('⭐️ ', 'utf-8') # u'\u2b50\ufe0f' - output
Output :
defaultdict(int, {'⭐': 1, '': 1, '': 1})
That regex is shamelessly stolen from this link.

Community
- 1
- 1

hashcode55
- 5,622
- 4
- 27
- 40