I want to turn a list with repeated string like
["ask","a","public","question","ask","a","public","question"]
And the output dictionary should has element of list as key and the occurrence indexes as values.
{"ask":[0,4],"a":[1,5],"public":[2,6],"question";[3,7]}
Any hint? I actually dealing with a bigram perplexity of a corpus, where I have already get the total occurrence of bigram words, i.e., count(B|A), but now I need to get the total occurrence of count(A), where count(A), should be all occurrences of any two words combination start from A. I took the bigram dictionary keys as list and change it to contains only the first words list such as
[['You', 'will'], ['will', 'face'], ['face', 'many'], ['many', 'defeats']
to
['You', 'will', 'face', 'many']
, So I need to calculate all occurrences of each words one by one in that bigram dictionary. I tried several data structures like list, dict, and defaultdict, but they all took so long. I just want to find another datastructure that can deal fastly