I am working with weblogs and have data containing account_id and session_id. Multiple sessions can be associated with one account. I want to create a new dataframe containing account_id and count the number of unique sessions associated with that account. My df looks like this:
account_id session_id
1111 de322
1111 de322
1111 de322
1111 de323
1111 de323
0210 ge012
0210 ge013
0211 ge330
0213 ge333
I'm using this code:
new_df = df.groupby(['account_id','session_id']).sum()
The output I am getting is below:
account_id sessions
1111 de322
de323
0210 ge012
ge013
0211 ge330
0213 ge333
The output I'm expecting
account_id sessions
1111 2
0210 2
0211 1
0213 1
How should I fix it?