I have to write three dictionaries from the same csv file. The input file is
col1 col2 value
item1 a value1
item1 b value2
item1 c value3
item2 a value4
item2 c value5
...
And I need these three dictionaries:
1.
dict1
item1:set(a,b,c)
item2:set(a,c)
...
2.
dict2
set(item1,a):value1
set(item1,b):value2
set(item1,c):value3
set(item2,a):value4
set(item2,c):value5
I need to use sets as values in the first dictionary because then I will have to perform intersections between values and I think set is the more suitable type.
My final dictionary, resulting from these intersections, will be something like:
3.
dict3
(item1,item2):value1+value3
It is probably easier to understand just by looking at the examples, but let me explain it: basically dict3 considers the pairwise intersections between the values of dict1, which in my example is only a
, and then does dict2.get((item1,a))+dict2.get((item2,a))
and assigns it as value to the couple (item1,item2). If item1 and item2 had in common another element, let's say d
, the value for (item1,item2)
would then be dict2.get((item1,a))+dict2.get((item2,a))+dict2.get((item1,d))+dict2.get((item2,d))
. Please note that in the real dataset col1 and col2 items are strings.
This calculation is repeated checking every pairwise intersection of values in dict1.
What's the easiest way to get these dictionaries? I am more comfortable using pandas, so I'd ask you to suggest solutions using a dataframe, but I can accept anything which reads directly from the external file as well, since this comes into play only in the very first stage.
EDIT I should probably clarify better that I need a pairwise intersection, and this issue doesn't arise with the example I gave. Just to have a better example on which one can work, try:
df=pd.DataFrame(columns=['col1','col2','value'])
df.col1=['item1','item1','item1','item2','item2','item3','item3']
df.col2=['a','b','c','a','d','a','c']
df.value=[1,2,3,4,5,6,7]
and try to get as a result:
dict3
(item1,item2):5
(item1,item3):17
(item2,item3):10
It seems like a very complex problem: I found something on pairwise set intersection here but I can't find a final solution.