When building a python gensim word2vec model, is there a way to see a doc-to-word matrix?
With input of sentences = [['first', 'sentence'], ['second', 'sentence']]
I'd see something like*:
first second sentence
doc0 1 0 1
doc1 0 1 1
*I've illustrated 'human readable', but I'm looking for a scipy (or other) matrix, indexed to model.wv.index2word
.
And, can that be transformed into a word-to-word matrix (to see co-occurences)? Something like:
first second sentence
first 1 0 1
second 0 1 1
sentence 1 1 2
I've already implemented something like word-word co-occurrence matrix using CountVectorizer. It works well. However, I'm already using gensim in my pipeline and speed/code simplicity matter for my use-case.