I’m new to topic modelling.
So I hope someone experienced can answer my queries. Here’s a simplified format of my data: 1. I have a csv file of dimension of 1000*2. (mixture of topics) 2. Each row is a document and a document ID. each document can have multiple lines, and the document can be smth like: eg- the movie is about Harry Potter. I like to watch.
So, I wanted to find the natural clusters/ topics from the topic models, and manually assign the labels to the clusters based on the TOP terms.
So I spilt each document into individual tokens and used LDA, then used the lowest perplexity score to get the optimal cluster.
After using LDA, I plotted the Visualizations of the most occurring terms for each topic.
However, 1. I’m not sure if I should do a bi/n gram- if so how to do it? Because I know that there are some terms which must occur together. 2. Do I have to use network graph to see how the different terms correlate to each other? Or different topics link together? 3. Not too sure if I’m doing the right way