0

I’m new to topic modelling.

So I hope someone experienced can answer my queries. Here’s a simplified format of my data: 1. I have a csv file of dimension of 1000*2. (mixture of topics) 2. Each row is a document and a document ID. each document can have multiple lines, and the document can be smth like: eg- the movie is about Harry Potter. I like to watch.

So, I wanted to find the natural clusters/ topics from the topic models, and manually assign the labels to the clusters based on the TOP terms.

So I spilt each document into individual tokens and used LDA, then used the lowest perplexity score to get the optimal cluster.

After using LDA, I plotted the Visualizations of the most occurring terms for each topic.

However, 1. I’m not sure if I should do a bi/n gram- if so how to do it? Because I know that there are some terms which must occur together. 2. Do I have to use network graph to see how the different terms correlate to each other? Or different topics link together? 3. Not too sure if I’m doing the right way

R_abcdefg
  • 145
  • 1
  • 11
  • 1
    I'm voting to close this question as off-topic because not at all about programming – camille Jun 13 '18 at 12:45
  • @camille, it’s about programming. Because would appreciate if someone can guide the R steps to do the subsequent analysis – R_abcdefg Jun 13 '18 at 16:29
  • Please [see here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for how to post an R question that folks can answer. That includes posting data and code that you've already written, and a detailed, specific question you're trying to solve. What you're looking for is a broader tutorial, which SO can't provide – camille Jun 13 '18 at 16:35

0 Answers0