0

Below is a sample code. I'm trying to get vector representation for all the words in newfile.txt (file having a news article). Would like to know if model.getVectors().keys() outputs all the keys (distinct words in the file) or does it limits the output to certain number.

Currently I get only a few words as key though my input does have many. How does it work?

doc = sc.textFile('newfile.txt').map(lambda line: line.split(" "))

model = Word2Vec().fit(doc)

model.getVectors().keys()

1 Answers1

0

I found the answer, all the keys(words) was not listed because of this parameter to the model setMinCount(), has a default value of 5

From the documentation

Thanks for the help !

desertnaut
  • 57,590
  • 26
  • 140
  • 166