Below is a sample code. I'm trying to get vector representation for all the words in newfile.txt (file having a news article). Would like to know if model.getVectors().keys() outputs all the keys (distinct words in the file) or does it limits the output to certain number.
Currently I get only a few words as key though my input does have many. How does it work?
doc = sc.textFile('newfile.txt').map(lambda line: line.split(" "))
model = Word2Vec().fit(doc)
model.getVectors().keys()