I have a set of documents that describe different dimensions of corporate culture. Tokenized examples below:
sent1=['innovative','culture','fast','moving','company']
sent2=['manager','micromanage','all','time']
sent3=['slow','response','customer']
I've already applied Glove and Gensim w2v to the above documents. I'd like to identify documents that have high cosine similarity score to a sets of word, such as
Innovation =['innovate','innovative','fast']
How do I calculate the cosine similarities between each document (e.g. sent1, sent2) and Innovation
using Gensim?
Ideal Output:
innovation
sent1 0.98
sent2 0.45
sent3 -0.2