I have a dataset, df, where I would like to:
filter the values in the 'id' column, group these values, take their average, and then sum these values
id use free total G_Used G_Free G_Total
a 5 5 10 4 1 5
b 14 6 20 5 1 6
a 10 5 15 9 1 10
c 6 4 10 10 10 20
b 10 5 15 5 5 10
b 5 5 10 1 4 5
c 4 1 5 3 1 4
Desired Output
use free total
9.5 7.5 20
filter only values that contain 'a' or 'b'
group by each id
take the mean of the 'use', 'free' and 'total' columns
sum these values
Intermediate steps:
filter out only the a and c values
id use free total G_Used G_Free G_Total
a 5 5 10 4 1 5
a 10 5 15 9 1 10
c 6 4 10 10 10 20
c 4 1 5 3 1 4
take mean of a
a
use free total
7.5 5 12.5
take mean of c
c
use free total
2 2.5 7.5
sum both a and c values for final desired output
use free total
9.5 7.5 20
This is what I am doing, however the syntax is not correct for some of the code. I am still researching. Any suggestion is appreciated
df1 = df[df.id = 'a' | 'b']
df2 = df1.groupby(['id'], as_index=False).agg({'use': 'mean', 'free': 'mean', 'total': 'mean'})
df3= df2.sum(['id'], axis = 0)