I have 2 (or more dictionaries), each dictionary extracted and processed from a source.
The dictionary is of the format word : count
Let us say, from document No. 1, this is the dictionary that I extract:
dic1 = {'hello' : 1, 'able' : 3, 'of' : 9, 'advance' : 2, 'occurred' : 4, 'range' : 1}
And, from document No. 2, this is the dictionary:
dic2 = {'of' : 6, 'sold' : 4, 'several' : 3, 'able' : 2, 'advance' : 1}
I want to combine the two dictionaries such that
- Combine them such that if the words intersect, add up their values. This seems fairly do-able, from this question
- Combine them such that if the words intersect, append the document numbers for them. (I would also like to get a count, but that can be done by just taking the length of this new array)
For 1. a sample output would be:
dictop1 = {'hello' : 1, 'able' : 5, 'of' : 15, 'advance' : 3, 'occurred' : 4, 'range' : 1, 'sold' : 4, 'several' : 3}
For 2. a sample output would be:
dictop2 = {'hello' : [1], 'able' : [1,2], 'of' : [1,2], 'advance' : [1,2], 'occurred' : [1], 'range' : [1], 'sold' : [2], 'several' : [2]}
I will be iterating through thousands of such dictionaries, and doing the operations I mentioned above.
At the end, I require a dataframe of the following format:
Word | Count | DocsOccuredIn
How would I go about doing this?
One possible solution, is to find the two dictionaries I mentioned above separately, create 2 dataframes and merge them. In that case, how can I obtain the second dictionary. Or, is there a better way to approach this problem?