I have a dataset that contains a title, and I want to extract some words from it. I used the count() function to check the number of total number of occurrences for each word, and then plot them. Here is the code:
install.packages("remotes")
remotes::install_github("tweed1e/werfriends")
library(werfriends)
friends_raw <- werfriends::friends_episodes
library(tidytext)
library(tidyverse)
custom_stop_words <- bind_rows(tibble(word = c("1","2", "one"),
lexicon = c("custom", "custom", "custom")),
stop_words)
friends_raw %>%
unnest_tokens(word, title) %>%
mutate(word = str_remove(word, "'s")) %>%
anti_join(bind_rows(custom_stop_words)) %>%
count(word) %>%
top_n(10) %>%
mutate(word = fct_reorder(word, n)) %>%
ggplot(aes(x = word, y = n)) + geom_col() + coord_flip() +
scale_y_continuous(breaks = seq(0,30,5))
In the friends_raw
dataset there is also a column season
for each title, and I would like to also plot the season where the occurences happen, with fill
. The problem is that, with this approach I don't know how to save the season
column and do the count, getting the results ordered.
Any clues on how to perform this?