pandas groupby nunique per multiple columns

Question

I have a dataframe

df = pd.DataFrame({'id':[1,1,2,2,3], 
                   'user':['u1', 'u1', 'u2', 'u2', 'u3'],
                   'date':['2021-04-25','2021-04-25','2021-04-25','2021-04-26', '2021-04-25'],
                   'sth_else1':['xx','yy','xx','xx','xx'], 'sth_else2':['zz','yy','zz','xx','xx']})

   id   user    date    sth_else1   sth_else2
0   1   u1  2021-04-25  xx          zz
1   1   u1  2021-04-25  yy          yy
2   2   u2  2021-04-25  xx          zz
3   2   u2  2021-04-26  xx          xx
4   3   u3  2021-04-25  xx          xx

and I would like to add a column (so probably use groupby with transform?) to that dataframe that shows me the number of unique user/date combinations I have per id in the whole dataframe, so that I would get this

   id   user    date    sth_else1   sth_else2   count_per_id_user_and_date
0   1   u1  2021-04-25  xx          zz          1
1   1   u1  2021-04-25  yy          yy          1
2   2   u2  2021-04-25  xx          zz          2
3   2   u2  2021-04-26  xx          xx          2
4   3   u3  2021-04-25  xx          xx          1

how would I do this?

`df['count_per_id_user_and_date'] = df.groupby(['id', 'user', 'date'])['date'].transform('size')`, not sure why you mentioned `nunique` in the title. Note that counting the unique items in sth_else2 gives the same result (`df.groupby(['id', 'user', 'date'])['sth_else2'].transform('nunique')`) — mozway, Jul 06 '23 at 12:00
I made it the example clearer: I would like to have the number of unique date/user combinations per id. So id 2 has two different user/date combinations while id 1 and 3 only have one — corianne1234, Jul 06 '23 at 12:54

score 1 · Accepted Answer · answered Jul 06 '23 at 13:04

If you want to count the number of unique combinations of user/date per id, you can first identify the non-duplicates, then groupby.transform('sum'):

df['count_per_id_user_and_date'] = (~df[['id', 'user', 'date']].duplicated()
                                   ).groupby(df['id']).transform('sum')

Or:

df['count_per_id_user_and_date'] = df['id'].map(
    df[['id', 'user', 'date']].drop_duplicates()
    .groupby('id').size()
)

Output:

   id user        date sth_else1 sth_else2  count_per_id_user_and_date
0   1   u1  2021-04-25        xx        zz                           1
1   1   u1  2021-04-25        yy        yy                           1
2   2   u2  2021-04-25        xx        zz                           2
3   2   u2  2021-04-26        xx        xx                           2
4   3   u3  2021-04-25        xx        xx                           1

Sadek Mehri · Answer 2 · 2023-07-06T13:37:30.270

0

Maybe you should try this piece of code to make it happen

df['final_result'] = df.groupby(['id', 'user', 'date'])['id'].transform('size')

edited Jul 06 '23 at 13:37

answered Jul 06 '23 at 11:59

Sadek Mehri

50
1
5

I would like to have the number of unique user/date combinations per id, this gives me the the total count – corianne1234 Jul 06 '23 at 12:56

pandas groupby nunique per multiple columns

2 Answers2