How extract vocabulary vectors from gensim's word2vec?

Question

I want to analyze the vectors looking for patterns and stuff, and use SVM on them to complete a classification task between class A and B, the task should be supervised. (I know it may sound odd but it's our homework.) so as a result I really need to know:

1- how to extract the coded vectors of a document using a trained model?

2- how to interpret them and how does word2vec code them?

I'm using gensim's word2vec.

If you are trying to categorize whole documents, you should check Doc2Vec model which is also available in gensim library. The (little bit outdated) tutorial is here: https://rare-technologies.com/doc2vec-tutorial/ and be sure to check my answer here with up-to-date version: http://stackoverflow.com/questions/31321209/doc2vec-how-to-get-document-vectors/39329194#39329194 — Lenka Vraná, May 11 '17 at 15:25

score 2 · Answer 1 · answered May 15 '17 at 10:16

If you have trained word2vec model, you can get word-vector by __getitem__ method

model = gensim.models.Word2Vec(sentences) print(model["some_word_from_dictionary"])
Unfortunately, embeddings from word2vec/doc2vec not interpreted by a person (in contrast to topic vectors from LdaModel)

P/S If you have texts at the object in your tasks, then you should use Doc2Vec model

How extract vocabulary vectors from gensim's word2vec?

1 Answers1