1

I have a dataframe

df = pd.DataFrame({'id':[1,1,2,2,3], 
                   'user':['u1', 'u1', 'u2', 'u2', 'u3'],
                   'date':['2021-04-25','2021-04-25','2021-04-25','2021-04-26', '2021-04-25'],
                   'sth_else1':['xx','yy','xx','xx','xx'], 'sth_else2':['zz','yy','zz','xx','xx']})
   id   user    date    sth_else1   sth_else2
0   1   u1  2021-04-25  xx          zz
1   1   u1  2021-04-25  yy          yy
2   2   u2  2021-04-25  xx          zz
3   2   u2  2021-04-26  xx          xx
4   3   u3  2021-04-25  xx          xx

and I would like to add a column (so probably use groupby with transform?) to that dataframe that shows me the number of unique user/date combinations I have per id in the whole dataframe, so that I would get this

   id   user    date    sth_else1   sth_else2   count_per_id_user_and_date
0   1   u1  2021-04-25  xx          zz          1
1   1   u1  2021-04-25  yy          yy          1
2   2   u2  2021-04-25  xx          zz          2
3   2   u2  2021-04-26  xx          xx          2
4   3   u3  2021-04-25  xx          xx          1

how would I do this?

corianne1234
  • 634
  • 9
  • 23
  • `df['count_per_id_user_and_date'] = df.groupby(['id', 'user', 'date'])['date'].transform('size')`, not sure why you mentioned `nunique` in the title. Note that counting the unique items in sth_else2 gives the same result (`df.groupby(['id', 'user', 'date'])['sth_else2'].transform('nunique')`) – mozway Jul 06 '23 at 12:00
  • I made it the example clearer: I would like to have the number of unique date/user combinations per id. So id 2 has two different user/date combinations while id 1 and 3 only have one – corianne1234 Jul 06 '23 at 12:54

2 Answers2

1

If you want to count the number of unique combinations of user/date per id, you can first identify the non-duplicates, then groupby.transform('sum'):

df['count_per_id_user_and_date'] = (~df[['id', 'user', 'date']].duplicated()
                                   ).groupby(df['id']).transform('sum')

Or:

df['count_per_id_user_and_date'] = df['id'].map(
    df[['id', 'user', 'date']].drop_duplicates()
    .groupby('id').size()
)

Output:

   id user        date sth_else1 sth_else2  count_per_id_user_and_date
0   1   u1  2021-04-25        xx        zz                           1
1   1   u1  2021-04-25        yy        yy                           1
2   2   u2  2021-04-25        xx        zz                           2
3   2   u2  2021-04-26        xx        xx                           2
4   3   u3  2021-04-25        xx        xx                           1
mozway
  • 194,879
  • 13
  • 39
  • 75
0

Maybe you should try this piece of code to make it happen

df['final_result'] = df.groupby(['id', 'user', 'date'])['id'].transform('size')
Sadek Mehri
  • 50
  • 1
  • 5