I would like to count the number of pairs of items in a column. I made my solution for it, but I would like to know if there are more concise solutions.
Here is an example and my approach. I have a DataFrame like this.
df = pd.DataFrame({'id':[1,1,2,2], 'item':['apple','orange','orange','apple']})
Finally, I would like to know what items are bought most together. Therefore, in this case, I'd like to get a result that orange and apple are bought most together.
Then, I did groupby based on values in id
column.
id_group = df.groupby('id')
Then, to count the number of pairs of items in item
column, I made a function like below and applied to item
column of id_group
. After this, I combined lists of tuples using sum()
. Finally, I used Counter()
to count the number of pairs containing same items.
In combos()
, I used sorted()
to avoid counting ('apple','orange')
and ('orange','apple')
separately.
Are there better approaches to get the result showing there are 2 pairs of ('apple','orange')
or 2 pairs of ('orange','apple')
import itertools
from collections import Counter
def combos(x):
combinations = []
num = x.size
while num != 1:
combinations += list(itertools.combinations(x,num))
num -= 1
element_sorted = map(sorted,combinations)
return list(map(tuple,element_sorted))
k= id_group['item'].apply(lambda x:combos(x)).sum()
Counter(k)