I cant seem to make sense of the dataset provided by Keras' reuters dataset.
The set is loaded like so:
(x_train, y_train), (x_test, y_test) = reuters.load_data()
As far as I understand the "x" arrays are arrays of sequences (lists) of word indices from news stories and the "y" arrays are arrays of the topics of these sequences.
But when I try to translate the word indices of one of the sequences with the provided dictionary into actual words:
wordDict = {y:x for x,y in reuters.get_word_index().items()}
for index in x_train[0]:
print (wordDict.get(index))
The sequence seems to make no sense. How do I turn the sequences back into the original news?
Edit: found a similar thread here. Seems like there is a problem with the indices in the dictionary not matching the word indices in the dataset. But redownloading the data does not resolve the problem for me.