I have created the following dictionary from the Cranfield Collection:
{
'd1' : ['experiment', 'studi', ..., 'configur', 'experi', '.'],
'd2' : ['studi', 'high-spe', ..., 'steadi', 'flow', '.'],
...,
'd1400': ['report', 'extens', ..., 'graphic', 'form', '.']
}
Each key, value pair represents a single document as the key and the value as a list of tokenized, stemmed words with stopwords removed. I need to create an inverted index from this dictionary with the following format:
{
'experiment': {'d1': [1, [0]], ..., 'd30': [2, [12, 40]], ..., 'd123': [3, [11, 45, 67]], ...},
'studi': {'d1': [1, [1]], 'd2': [2, [0, 36]], ..., 'd207': [3, [19, 44, 59]], ...}
...
}
Here the key becomes the term while the value is a dictionary that contains the document that term shows up in, the number of times, and the indices of the document where the term is found. I am not sure how to approach this conversion so I am just looking for some starter pointers as to how to think about this problem. Thank you.