I have this piece of code:
import gensim
import random
file = open('../../../dataset/output/interaction_jobroles_titles_tags.txt')
read_data = file.read()
data = read_data.split('\n')
sentences = [line.split() for line in data]
print(len(sentences))
print(sentences[1])
model = gensim.models.Word2Vec(min_count=1, window=10, size=300, negative=5)
model.build_vocab(sentences)
for epoch in range(5):
shuffled_sentences = random.shuffle(sentences)
model.train(shuffled_sentences)
print(epoch)
print(model)
model.save("../../../dataset/output/wordvectors_jobroles_titles_300d_10w_wordshuffling" + '.model')
If I print a single sentence, then it output is something like this:
['JO_3787672', 'JO_272304', 'JO_2027410', 'TI_2969041', 'TI_2509936', 'TA_954638', 'TA_4321623', 'TA_339347', 'TA_272304', 'TA_3017535', 'TA_494116', 'TA_798840']
What I need is to shuffle the words before training and then save the model.
I am not sure whether I am coding it in a right way. I end up with exception:
Exception in thread Thread-8:
Traceback (most recent call last):
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/threading.py", line 914, in _bootstrap_inner
self.run()
File "/usr/local/Cellar/python3/3.5.1/Frameworks/Python.framework/Versions/3.5/lib/python3.5/threading.py", line 862, in run
self._target(*self._args, **self._kwargs)
File "/usr/local/lib/python3.5/site-packages/gensim/models/word2vec.py", line 747, in job_producer
for sent_idx, sentence in enumerate(sentences):
File "/usr/local/lib/python3.5/site-packages/gensim/utils.py", line 668, in __iter__
for document in self.corpus:
TypeError: 'NoneType' object is not iterable
I would like to ask you how can I shuffle words.