I have two sentences:
sent1="This work has been completed by Christopher Pan".
sent2="This job has been finished by Mark Spencer".
I calculated the similarity off sentences using Word2vec:
from sklearn.metrics.pairwise import cosine_similarity
def avg_sentence_vector(words, model, num_features, index2word_set):
featureVec = np.zeros((num_features,), dtype="float32")
nwords = 0
for word in words:
if word in index2word_set:
nwords = nwords+1
featureVec = np.add(featureVec, model[word])
if nwords>0:
featureVec = np.divide(featureVec, nwords)
return featureVec
as follows:
sent1_avg_vector = avg_sentence_vector(sent1.split(), model=word2vec_model, num_features=100)
sent2_avg_vector = avg_sentence_vector(sent2.split(), model=word2vec_model, num_features=100)
sen1_sen2_similarity = cosine_similarity(sent1_avg_vector, sent2_avg_vector)
I would like to know how I can build a semantic tree which can tell me that:
completed
andfinished
are similar words;work
andjob
are similar words too;- then if I find
work/job
in the sentence orfinished/completed
, these words are both connected withChristopher
andMark
.
I do not know technically if there is something in Python that can allow me to get such results. I would appreciate if you could guide me into the right direction.
Thanks