How to pull emojis from a column?

Question

I have a dataset that contains a lot of sentences, and I'd like to pull emojis in order to understand which emojis were used most.

Here is the data:

df <- tribble(
  ~sentence, 
  "Ask yourself:  “Does this person fully understand all of their responsibilities?",
  "What do you see as the reason for it? ",
  "Your goal is to get perspective on the situation. ✅"
)

The desired output could be something like this:

df <- tribble(
  ~emojis, ~used,
  "",     1,
  "",     2, 
  "",     1, 
  "✅",     1, 
)

How can I do this?

Related: https://stackoverflow.com/questions/43359066/how-can-i-match-emoji-with-an-r-regex — MrFlick, Nov 29 '22 at 15:39

score 1 · Accepted Answer · answered Nov 29 '22 at 17:59

If we re-interpret the request for emoji to be non-ascii characters, we can get a bit of help from stringr and dplyr to make the desired data.frame

stringr::str_extract_all(df$sentence,"[^[:ascii:]]") %>% 
 unlist() %>% 
 tibble(emoji=.) %>% 
 dplyr::count(emoji)

This simplification helps because the emoji aren't single characters in a specific unicode character range that's easy to pull out with a regex (as far as i know).

How to pull emojis from a column?

1 Answers1