0

I'm doing text mining on a corpus of TXT files, and I'm trying to display the most frequent terms in a bar graph, sorted by number of occurrences in the corpus.

# create corpus from folder with TXT files
raw_text <- read_folder(input.dir)
tidy_text <- raw_text %>%
  group_by(id) %>%
  unnest_tokens(word, text)
# ATTEMPT #1
# count most frequent words
# and display words that appear over 200 times in the corpus
tidy_text %>%
  count(word, sort = TRUE) %>%
  filter(n > 200) %>%
  mutate(word = reorder(word, n)) %>%
  ggplot(aes(word, n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip()

This provided me with a bar graph that's sorted in reverse alphabetical order, NOT by n.

I also tried this one:

# ATTEMPT #2
# count most frequent words
# and display words that appear over 200 times in the corpus
tidy_text %>%
  count(word, sort = TRUE) %>%
  filter(n > 200) %>%
  ggplot(aes(reorder(word, n), n)) +
  geom_col() +
  xlab(NULL) +
  coord_flip()

This provided me with a bar graph in which the terms were in no particular order--neither alphabetical nor sorted by n.

If anybody were able to help me sort my bar graph by n, I'd really appreciate it! Thanks.

Rehlein
  • 1
  • 2
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick May 20 '21 at 19:40
  • It's hard to tell without an example, but maybe you need `ggplot(aes(reorder(factor(word), n), n))` if `word` is not already a factor – MrFlick May 20 '21 at 19:42
  • Thank you, I appreciate the support! I think there must be a problem with my data set, not with the sorting algorithm. I tried lots of exemplary code and they all sort in different ways. I'll keep fiddling and maybe I'll come up with a solid reprex soon. – Rehlein May 21 '21 at 20:28
  • Something like `dput(head(tidy_text %>% count(word, sort = TRUE) %>% filter(n > 200), 10))` would help a lot. – MrFlick May 21 '21 at 20:30
  • Thank you for your patience! I entered the comment and the output is too long for a comment. I created a minimal reproducible example and posted a new question. [link](https://stackoverflow.com/questions/67708519/order-ggplot-bar-graph-by-descending-n-with-mre-not-yet-answered-elsewhere) – Rehlein May 26 '21 at 15:55

0 Answers0