1

I have a table that looks like this:

user id observation
25 2
25 3
25 2
23 1
23 3

the desired outcome is:

user id observation retention
25 7 3
23 4 2

I want to keep the user id column with unique ids and have another column showing how many times this id has appeared in the dataset summing up the observation column values.

any help will be appreciated thanks

Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
zuy007
  • 37
  • 5

2 Answers2

1

Use groupby() method and chain agg() method to it:

outputdf=df.groupby('user id',as_index=False).agg(observation=('observation','sum'),retention=('observation','count'))

Now if you print outputdf you will get your desired output:

     user id    observation     retention
0       23          4               2
1       25          7               3
Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
1

You have to use group by:

import pandas as pd 

d = {'user id': [25,25,25,33,33], 'observation': [2,3,2,1,3]}

# get the dataframe
df = pd.DataFrame(data=d)

df_new = df.groupby('user id').agg({"sum", "count"}).reset_index()

# rename the columns as you desire
df_new.columns = ['user id', 'observation', 'retention']
df_new

Output: enter image description here

Sulphur
  • 514
  • 6
  • 24