Counting the number of pairs of items in a grouped dataframe's column. (Pandas)

Question

I would like to count the number of pairs of items in a column. I made my solution for it, but I would like to know if there are more concise solutions.

Here is an example and my approach. I have a DataFrame like this.

df = pd.DataFrame({'id':[1,1,2,2], 'item':['apple','orange','orange','apple']})

Finally, I would like to know what items are bought most together. Therefore, in this case, I'd like to get a result that orange and apple are bought most together.

Then, I did groupby based on values in id column.

id_group = df.groupby('id')

Then, to count the number of pairs of items in item column, I made a function like below and applied to item column of id_group. After this, I combined lists of tuples using sum(). Finally, I used Counter() to count the number of pairs containing same items. In combos(), I used sorted()to avoid counting ('apple','orange') and ('orange','apple') separately.

Are there better approaches to get the result showing there are 2 pairs of ('apple','orange') or 2 pairs of ('orange','apple')

import itertools 
from collections import Counter
def combos(x):
     combinations = []
     num = x.size
     while num != 1:
          combinations += list(itertools.combinations(x,num))
          num -= 1
     element_sorted = map(sorted,combinations)
     return list(map(tuple,element_sorted))

k= id_group['item'].apply(lambda x:combos(x)).sum()
Counter(k)

jezrael · Accepted Answer · 2020-05-28T08:16:52.667

2

Use all_subsets function with change 0 to 2 for pairs, triples... like your soluton:

#https://stackoverflow.com/a/5898031
from itertools import chain, combinations
def all_subsets(ss):
    return chain(*map(lambda x: combinations(ss, x), range(2, len(ss)+1)))

And then flatten values, I think better is not use sum to concatenate lists. It looks fancy but it's quadratic and should be considered bad practice.

So here is used flattening with sorted tuples in list comprehension:

k = [tuple(sorted(z)) for y in id_group['item'].apply(all_subsets) for z in y]

print (Counter(k))
Counter({('apple', 'orange'): 2})

edited May 28 '20 at 08:16

answered May 27 '20 at 08:19

jezrael

822,522
95
1,334
1,252

Thank you so much for your very helpful answer ! I really appreciate it. – Sherlock_Hound May 27 '20 at 08:41
@Sherlock_Hound - Solution was unaccepted, there is some problem? – jezrael May 28 '20 at 05:51
I apologize for messing up. I guess that `sorted(y)` in the list comprehension do not change the order of items in tuples. When I ran the code, I still got `Counter({('apple', 'orange'): 1, ('orange', 'apple'): 1})`. – Sherlock_Hound May 28 '20 at 08:13
@Sherlock_Hound - You are right, sorry, edited answer. – jezrael May 28 '20 at 08:17
You don't have to be sorry. What you have done was really helpful! And It is my fault that I did not actually run your code. Thank you! – Sherlock_Hound May 28 '20 at 08:21
@Sherlock_Hound - No problem, now working and it is super ;) – jezrael May 28 '20 at 08:21

Bertil Johannes Ipsen · Answer 2 · 2020-05-27T08:15:11.490

1

How about this?

from collections import Counter
k = df.groupby('id')['item'].apply(lambda x: tuple(x.sort_values()))
Counter(k)

Counter({('apple', 'orange'): 2})

edited May 27 '20 at 08:15

answered May 27 '20 at 08:07

Bertil Johannes Ipsen

1,656
1
14
27

I think you mentioned that you are not sure about what I want to do. Sorry about that. I edited my question a little bit. I hope that it will help. And I really appreciate your contribution. – Sherlock_Hound May 27 '20 at 08:32
Np, I think @Jezrael's solution is the way to go :-) – Bertil Johannes Ipsen May 27 '20 at 08:35

Counting the number of pairs of items in a grouped dataframe's column. (Pandas)

2 Answers2