I have a corpus of text. For a preprocessing data I've vectorized all text using gensim Word2Vec. I don't understand what I do exactly wrong. For the base I've took this discussion (and good tutorial) predict next word. Code: Source code.
As input I have lines of sentences. I want to take each line, then take word[0] of this line -> predict word[1 ]. Then using word[0] and word[1 ] predict word[3], and so on to the end of line.
In this tutorial each time predicts fix length of words. What I do:
def on_epoch_end(epoch, _):
print('\nGenerating text after epoch: %d' % epoch)
for sentence in inpt:
word_first=sentence.split()[0]
sample = generate_next(word_first, len(sentence))
print('%s... -> %s' % (word_first, sample))
I take first word and use it to generate all next. And as second parameter I give length of sentence (not num_generated=10
) as in tutorial. But it doesn't help for me at all. Every time I'm getting output predicted sequence of words with random(in my opinion) length.
What am I doing wrong and how to fix it?