I want to visualize similarity of text documents for which I am using scikit-learn's TfidfVectorizer as tfidf = TfidfVectorizer(decode_error='ignore', max_df=3).fit_transform(data)
and then performing cosine similarity calculation as cosine_similarity = (tfidf*tfidf.T).toarray()
which gives similarity but sklearn.manifold.MDS
needs a dissimilarity matrix. When I give 1-cosine_similarity, the diagonal values which should be zero, are not zero. They are some small value like 1.12e-9
etc. Two questions:
1) How do I use similarity matrix for MDS or how do I change my similarity matrix to dissimilarity matrix?
2) In MDS, there is an option dissimilarity
, the values of which can be 'precomputed'
or 'euclidean'
. What's the difference between the two because when I give euclidean, the MDS coordinates come to be same regardless of whether I use cosine_similarity or 1-cosine_similarity which looks wrong.
Thanks!