0

So I have a list of people, each of them are given more than 2 books, 4 books are possible. I want to do a groupby and check frequency of combination of book received such as [ID, books] such ID: 1, he has Books: A, B I want to know how many people had received book combination of A and B.

Technically if someone has books A,B,C; he will have combination of (A,B),(A,C),(B,C),(A,B,C).

Input:

df = pd.DataFrame({'user': [1, 1, 2, 2, 3, 3, 3],
                   'disease': ['a', 'b', 'b', 'c', 'a', 'b', 'c']})[enter image description here][1]

enter image description here

P.P
  • 1
  • 1
  • Can you provide an example of dataset? Ensure you read [how to provide reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) first – mozway Apr 01 '22 at 02:01
  • Sorry this was my first time. import pandas as pd df = pd.DataFrame({'user': [1, 1, 2, 2, 3, 3, 3], 'disease': [a, b, b, c, a, b, c]}) – P.P Apr 01 '22 at 03:16

1 Answers1

0

You can use set operations.

Identify users with a given target combination:

target = {'a', 'b'}
df.groupby('user')['disease'].agg(lambda x: target.issubset(x))

Output:

user
1     True
2    False
3     True
Name: disease, dtype: bool

Count the number of users that match the target:

target = {'a', 'b'}
df.groupby('user')['disease'].agg(lambda x: target.issubset(x)).sum()

Output: 2

mozway
  • 194,879
  • 13
  • 39
  • 75
  • Thank you mozway, but I was thinking not use a given target combination. Becuase there could be lots of combinations, I have more than 20 differents book types. That is why I am stuck for a long time. – P.P Apr 01 '22 at 03:38
  • @XinfaP. Thus the importance of clearly defining your expected output. We cannot invent the logic for you. For 20 types, you have more than 1 million possible combinations (see [power set](https://en.m.wikipedia.org/wiki/Power_set)), is this something you want to work with? – mozway Apr 01 '22 at 03:51
  • Sorry, that was my bad. I think I would want to create a new column if a new combination shows up. for example, if there is a person with combination of A and B, the new column [A, B] will be created, and flaged 1 for this person. If a combination column has been created before, the next person will just flag 1 on that existing combination column. Again I apologize for the mistake, will improve next time! – P.P Apr 01 '22 at 04:12
  • Well you can still update your question. Think about it carefully, come up with a clear minimal example, edit your question to add it and I'll receive a notification. – mozway Apr 01 '22 at 04:18
  • I updated a photo of like my current most clear ideal example, because the format structure in text looks different as I typed in words. – P.P Apr 01 '22 at 04:38