I am trying to figure out what the best data structure to use in my code, I have considered dictionaries, list of dictionaries, classes etc but unsure what would be most efficient and fastest to use.
The program I wrote opens multiple text files and selects words based on a certain criteria, I then need to keep a track of unique words selected, the sentences they appear in, the files they appear in and a count of how many times they appear in total throughout the process.
I need to check if each selected word has already been added to the data structure as I iterate through the selected words (it will contain thousands of words).
If it has already been added then add the file it came from to a list as well as the sentence the word sits in and increment the count.
If not already there then add the word to the data structure, file and sentence and initialize count to 1.
I am not really constrained by memory but speed is an important factor so I am thinking that something like a C style trie could work, but not sure what would be the best way to implement that in python.
How would you do it?