How do I sum unique values per column in Python?

Question

I am working with weblogs and have data containing account_id and session_id. Multiple sessions can be associated with one account. I want to create a new dataframe containing account_id and count the number of unique sessions associated with that account. My df looks like this:

account_id session_id
 1111          de322
 1111          de322
 1111          de322
 1111          de323
 1111          de323
 0210          ge012
 0210          ge013
 0211          ge330
 0213          ge333

I'm using this code:

new_df = df.groupby(['account_id','session_id']).sum()

The output I am getting is below:

 account_id     sessions
 1111           de322
                de323
 0210           ge012 
                ge013 
 0211           ge330
 0213           ge333

The output I'm expecting

account_id   sessions
 1111           2
 0210           2  
 0211           1
 0213           1

How should I fix it?

Nihal · Answer 1 · 2018-08-14T13:46:17.367

3

df = pd.DataFrame({'session': ['de322', 'de322', 'de322', 'de323', 'de323', 'ge012', 'ge012', 'ge013', 'ge333'],
                   'user_id': [1111, 1111, 1111, 1111, 1111, 210, 210, 210, 211],
                   })
print(df)


df = df.drop_duplicates().groupby('user_id').count()
print(df)

output:

edited Aug 14 '18 at 13:46

answered Aug 14 '18 at 13:22

Nihal

5,262
7
23
41

In your script, you mixed account_id with session id and the numbers I'm expecting are not correct, still. Within the account_id 1111, there are 2 UNIQUE sessions, although 5 events. I am trying to count unique sessions per account, not a total number of sessions. – Tadas Melnikas Aug 14 '18 at 13:37
ok let me write code again – Nihal Aug 14 '18 at 13:38
see i have updated – Nihal Aug 14 '18 at 13:46
Thank you very much for your help, it does work! – Tadas Melnikas Aug 14 '18 at 13:55
can you accept the answer? – Nihal Aug 14 '18 at 13:56

How do I sum unique values per column in Python?

1 Answers1