Assuming I have a dict like this:
docDict = {"alpha": ["a", "b", "c", "a", "b"], "bravo": ["b", "c", "d", "c", "d"]}
And what I want to do is like calculating "Document Frequency": assuming each dictionary item is a document, and I have a specific word, so how many documents contain that word?
I've seen many posts telling me how to calculate frequency, but here if "a"
appears twice in document "alpha"
, I just need the count to be 1. So the "frequency" of "a"
should be 1, and "c"
should be 2.
I know I can iterate the whole documents dictionary, and add the counter when finding the word in a document. Or I can firstly make the words in every document unique, then combine all the documents and count the word.
But I think there's a better way, a more effective way. Any ideas?
BTW, is there any way I can keep the structure of the dict? In this example, I'd like to get a result of {"alpha": {'c': 2, 'b': 2, 'a': 1}, "bravo": {'c': 2, 'b': 2, 'd': 1}
Update
If here I have just a list (something like [["a", "b", "c", "a", "b"], ["b", "c", "d", "c", "d"]]
), how can I get a result list like [[1, 2, 2, 0], [0, 2, 2, 1]]
.
I've got no idea. The point is to expand each list and assure the order of the terms. Thoughts?