1

I have a dataset that contains a lot of sentences, and I'd like to pull emojis in order to understand which emojis were used most.

Here is the data:

df <- tribble(
  ~sentence, 
  "Ask yourself:  “Does this person fully understand all of their responsibilities?",
  "What do you see as the reason for it? ",
  "Your goal is to get perspective on the situation. ✅"
)

The desired output could be something like this:

df <- tribble(
  ~emojis, ~used,
  "",     1,
  "",     2, 
  "",     1, 
  "✅",     1, 
)

How can I do this?

datazang
  • 989
  • 1
  • 7
  • 20

1 Answers1

1

If we re-interpret the request for emoji to be non-ascii characters, we can get a bit of help from stringr and dplyr to make the desired data.frame

stringr::str_extract_all(df$sentence,"[^[:ascii:]]") %>% 
 unlist() %>% 
 tibble(emoji=.) %>% 
 dplyr::count(emoji)

This simplification helps because the emoji aren't single characters in a specific unicode character range that's easy to pull out with a regex (as far as i know).

MrFlick
  • 195,160
  • 17
  • 277
  • 295