I'm doing text mining on a corpus of TXT files, and I'm trying to display the most frequent terms in a bar graph, sorted by number of occurrences in the corpus.
# create corpus from folder with TXT files
raw_text <- read_folder(input.dir)
tidy_text <- raw_text %>%
group_by(id) %>%
unnest_tokens(word, text)
# ATTEMPT #1
# count most frequent words
# and display words that appear over 200 times in the corpus
tidy_text %>%
count(word, sort = TRUE) %>%
filter(n > 200) %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n)) +
geom_col() +
xlab(NULL) +
coord_flip()
This provided me with a bar graph that's sorted in reverse alphabetical order, NOT by n.
I also tried this one:
# ATTEMPT #2
# count most frequent words
# and display words that appear over 200 times in the corpus
tidy_text %>%
count(word, sort = TRUE) %>%
filter(n > 200) %>%
ggplot(aes(reorder(word, n), n)) +
geom_col() +
xlab(NULL) +
coord_flip()
This provided me with a bar graph in which the terms were in no particular order--neither alphabetical nor sorted by n.
If anybody were able to help me sort my bar graph by n, I'd really appreciate it! Thanks.