create a new dataframe based on given dataframe

Question

I have a table that looks like this:

user id	observation
25	2
25	3
25	2
23	1
23	3

the desired outcome is:

user id	observation	retention
25	7	3
23	4	2

I want to keep the user id column with unique ids and have another column showing how many times this id has appeared in the dataset summing up the observation column values.

any help will be appreciated thanks

score 1 · Accepted Answer · answered Apr 14 '21 at 04:29

Use groupby() method and chain agg() method to it:

outputdf=df.groupby('user id',as_index=False).agg(observation=('observation','sum'),retention=('observation','count'))

Now if you print outputdf you will get your desired output:

     user id    observation     retention
0       23          4               2
1       25          7               3

score 1 · Answer 2 · answered Apr 14 '21 at 05:09

You have to use group by:

import pandas as pd 

d = {'user id': [25,25,25,33,33], 'observation': [2,3,2,1,3]}

# get the dataframe
df = pd.DataFrame(data=d)

df_new = df.groupby('user id').agg({"sum", "count"}).reset_index()

# rename the columns as you desire
df_new.columns = ['user id', 'observation', 'retention']
df_new

Output:

create a new dataframe based on given dataframe

2 Answers2