I have a dataframe with a column "clear_message", and I created a column that counts all the words in each row.
history['word_count'] = history.clear_message.apply(lambda x: Counter(x.split(' ')))
For example, if the rows message is: Hello my name is Hello
Then the counter in his row, will be Counter({'Hello': 2, 'is': 1, 'my': 1, 'name': 1})
The problem
I have emoji in my text, and I want also a counter for the emoji.
For example:
test = 'here sasdsa'
test_counter = Counter(test.split(' '))
The output is:
Counter({'sasdsa': 1, 'here': 1})
But I want:
Counter({'sasdsa': 1, '': 5, 'here':1})
Clearly the problem is that I'm using split(' ')
.
What I thought about:
Adding a space before and after the emoji. like:
test = ' here sasdsa'
And then use the split, which will work.
- Not sure this approach is the best.
- Not sure how to do it. (I do know that if
i
is an emoji, thenif i in emoji.UNICODE_EMOJI
will return true (theemoji
package)).