I'm trying to use ggplot to plot my word frequency rankings from Quanteda. Works passing the 'frequency' variable to plot but I want a nicer graph.
ggplot needs two variables for aes. I've tried seq_along as suggested on a somewhat similar thread but the graph draws nothing.
ggplot(word_list, aes(x = seq_along(freqs), y = freqs, group = 1)) +
geom_line() +
labs(title = "Rank Frequency Plot", x = "Rank", y = "Frequency")
Any input appreciated!
symptoms_corpus <- corpus(X$TEXT, docnames = X$id )
summary(symptoms_corpus)
# print text of any element of the corpus by index
cat(as.character(symptoms_corpus[6500]))
# Create Document Feature Matrix
Symptoms_DFM <- dfm(symptoms_corpus)
Symptoms_DFM
# sum columns for word counts
freqs <- colSums(Symptoms_DFM)
# get vocabulary vector
words <- colnames(Symptoms_DFM)
# combine words and their frequencies in a data frame
word_list <- data.frame(words, freqs)
# re-order the wordlist by decreasing frequency
word_indexes <- order(word_list[, "freqs"], decreasing = TRUE)
word_list <- word_list[word_indexes, ]
# show the most frequent words
head(word_list, 25)
#plot
ggplot(word_list, aes(x = seq_along(freqs), y = freqs, group = 1)) +
geom_line() +
labs(title = "Rank Frequency Plot", x = "Rank", y = "Frequency")
By nicer graph I mean that using the base 'plot' function below works and illustrates the ranking distribution, but this takes only one variable. ggplot needs two and that's where the issue arises for me. The ggplot code will draw the graph but no data displayed.
plot(word_list$freqs , type = "l", lwd=2, main = "Rank frequency Plot", xlab="Rank", ylab ="Frequency")
Example dataset below:
first_column <- c("the","patient", "arm", "rash", "tingling", "was", "in", "not")
second_column <- c("4116407", "3599537", "2582586", "1323883", "1220894", "1012042", "925339", "822150")
word_list2 <- data.frame(first_column, second_column)
colnames(word_list2) <- c=("word", "freqs")