I work on some twitter-data and I want to filter the emoticons in a list. The data itself is encoded in utf8. I read the file in line by line like these three example lines:
['This', 'is', 'a', 'test', 'tweet', 'with', 'two', 'emoticons', '', '⚓️']
['This', 'is', 'another', 'tweet', 'with', 'a', 'emoticon', '']
['This', 'tweet', 'contains', 'no', 'emoticon']
I'd like to collect the emoticons for each line like that:
['', '⚓️']
and so on.
I already researched and found that there's an 'emoji' package in python. I tried to use it in my code like that
import emoji
with open("file.txt", "r", encoding='utf-8') as f:
for line in f:
elements = []
col = line.strip('\n')
cols = col.split('\t')
elements.append(cols)
emoji_list = []
data = re.findall(r'\X', elements)
for word in data:
if any(char in emoji.UNICODE_EMOJI for char in word):
emoji_list.append(word)
First try
import emoji
with open("file.txt", "r", encoding='utf-8') as f:
for line in f:
elements = []
col = line.strip('\n')
cols = col.split('\t')
elements.append(cols)
emoji_list = []
for c in elements:
if c in emoji.UNICODE_EMOJI:
emojilist.append(c)
Second Try
I tried the examples which were given here How to extract all the emojis from text? but they kinda didn't work for me and I'm not sure what I did wrong.
I'd really appreciate any help to extract the emoticons, thanks in advance! :)