0

Using LDA am trying to do topic modelling. To visualise topics in each document I tried using qplot, but getting an error and a warning.

Error: stat_bin() must not be used with a y aesthetic. Warning message: stat is deprecated

The program uses cora dataset.

This is the output plot I am looking for.

Below is the code snippet:

require("ggplot2")
require("reshape2")
require("lda")
# load documents and vocabulary
data(cora.documents)
data(cora.vocab)
theme_set(theme_bw())
# Number of topic clusters to display
K <- 10
# Number of documents to display
N <- 9
result <- lda.collapsed.gibbs.sampler(cora.documents,
                                      K, ## Num clusters
                                      cora.vocab,
                                      25, ## Num iterations
                                      0.1,
                                      0.1,
                                      compute.log.likelihood=TRUE)
# Get the top words in the cluster
top.words <- top.topic.words(result$topics, 5, by.score=TRUE)
# build topic proportions
topic.props <- t(result$document_sums) / colSums(result$document_sums)
document.samples <- sample(1:dim(topic.props)[1], N)
topic.props <- topic.props[document.samples,]
topic.props[is.na(topic.props)] <- 1 / K
colnames(topic.props) <- apply(top.words, 2, paste, collapse=" ")
topic.props.df <- melt(cbind(data.frame(topic.props),
                             document=factor(1:N)),
                       variable.name="topic",
                       id.vars = "document")

qplot(topic, value*100, fill=topic, stat = "identity",
      ylab="proportion (%)", data=topic.props.df,
      geom="histogram") +
  theme(axis.text.x = element_text(angle=0, hjust=1, size=12)) +
  coord_flip() +
  facet_wrap(~ document, ncol=3)
  • When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. You seem to be passing two variables to a univariate plot which doesn't make much sense. Maybe leave out the first `topic` in the call? Not sure exactly what you are after. – MrFlick May 01 '19 at 04:20
  • Without an example of the data, my best guess is that `geom = "col"` might work instead of `histogram`. – Marius May 01 '19 at 04:51
  • I have updated with the entire code. I have problem only with the qplot part. I am able to solve the problem using python. But still out of interest I would like to solve it in R as well. Thanks in advance for any help!!! –  May 01 '19 at 05:08
  • Hi Marius, geom ="col" worked. Thank you so much. –  May 01 '19 at 05:13

0 Answers0