3

I want to visualize the topic modeling made with the LDA-algorithm. I use the python module called "pyldavis" and as environment the jupyter notebook.

import pyLDAvis.sklearn
...
pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer)
pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer, mds='mmds')
pyLDAvis.sklearn.prepare(lda_tf, dtm_tf, tf_vectorizer, mds='tsne')

It does work fine, but I don't really understand the mds-parameter... Even after reading the documentation:

mds :function or a string representation of function

A function that takes topic_term_dists as an input and outputs a n_topics by 2 distance matrix. The output approximates the distance between topics. See js_PCoA() for details on the default function. A string representation currently accepts pcoa (or upper case variant), mmds (or upper case variant) and tsne (or upper case variant), if sklearn package is installed for the latter two.

Does somebody know what the differences btw. mds='pcoa', mds='mmds', mds='tsne'?

Thanks!

Community
  • 1
  • 1
rakael
  • 495
  • 1
  • 6
  • 16

2 Answers2

4

Dimension reduction via Jensen-Shannon Divergence &

pcoa:Principal Coordinate Analysis(aka Classical Multidimensional Scaling)

mmds:Metric Multidimensional Scaling

tsne:t-distributed Stochastic Neighbor Embedding

EEEEH
  • 759
  • 1
  • 7
  • 28
1

Simply put: text data, when transformed into numeric tabular data, usually is high-dimensional. On the other hand, visualizations on a screen is two-dimensional (2D). Thus, a method of dimension reduction is required to bring the number of dimensions down to 2.

mds stands for multidimensional scaling. The possible values of that argument are:

  • mmds (Metric Multidimensional Scaling),
  • tsne (t-distributed Stochastic Neighbor Embedding), and
  • pcoa (Principal Coordinate Analysis),

All of them are dimension reduction methods.

Another method of dimension reduction that may be more familiar to you but not listed above is PCA (principal component analysis). They all share the similar idea of reducing dimensionality without losing too much information, backed by different theories and implementations.

Nuclear03020704
  • 549
  • 9
  • 22