There is a similar question but the output I am looking for is different.
I have a dataframe which lists all the words (columns) and the number they occur for each document (rows).
It looks like this:
{'orange': {0: '1',
1: '3'},
'blue': {0: '0',
1: '2'}}
The output should "re-create" the original document as a bag of words in this way:
corpus = [
['orange'],
['orange', 'orange', 'orange', 'blue', 'blue']]
How to do this?