Multiple groupings and aggregations within one dataframe using Python

Question

I have a dataset, df, where I would like to:

filter the values in the 'id' column, group these values, take their average, and then sum these values

  id    use free    total   G_Used  G_Free  G_Total
   a    5   5       10      4       1       5
   b    14  6       20      5       1       6
   a    10  5       15      9       1       10
   c    6   4       10      10      10      20
   b    10  5       15      5       5       10
   b    5   5       10      1       4       5
   c    4   1       5       3       1       4

Desired Output

use   free  total   
9.5   7.5   20

filter only values that contain 'a' or 'b'
group by each id
take the mean of the 'use', 'free' and 'total' columns
sum these values

Intermediate steps:

filter out only the a and c values

id  use free    total   G_Used  G_Free  G_Total
a   5   5       10      4       1       5
a   10  5       15      9       1       10
c   6   4       10      10      10      20
c   4   1       5       3       1       4

take mean of a

a   
    
use free    total   
7.5 5       12.5

take mean of c

c

use free    total   
2   2.5     7.5

sum both a and c values for final desired output

use free    total   
9.5 7.5     20

This is what I am doing, however the syntax is not correct for some of the code. I am still researching. Any suggestion is appreciated

df1 = df[df.id = 'a' | 'b']
df2 = df1.groupby(['id'], as_index=False).agg({'use': 'mean', 'free': 'mean', 'total': 'mean'})
df3= df2.sum(['id'], axis = 0)

jezrael · Accepted Answer · 2020-12-02T09:22:44.893

2

Use Series.isin for test membership first and then filter columns with mean, ouput is summed and converted Series to one row DataFrame by Series.to_frame and DataFrame.T for transpose:

df1 = df[df.id.isin(['a','c'])]
df2 = df1.groupby('id')[['use','free','total']].mean().sum().to_frame().T

Your solution is similar, only used GroupBy.agg:

df1 = df[df.id.isin(['a','c'])]
df2 = df1.groupby('id').agg({'use': 'mean', 'free': 'mean', 'total': 'mean'}).sum().to_frame().T

print (df2)
    use  free  total
0  12.5   7.5   20.0

edited Dec 02 '20 at 09:22

answered Dec 02 '20 at 09:18

jezrael

822,522
95
1,334
1,252

Hi @Jezrael, what if I have a scenario where I wish to filter if the value CONTAINS a certain word , or IS LIKE- is there something like this? I am researching- thank you – Lynn Dec 02 '20 at 09:20
1

@Lynnette - yes, then use instead `isin` this solution - [link](https://stackoverflow.com/a/26577689) – jezrael Dec 02 '20 at 09:54
1

@Lynnette - So here `df1 = df[df.id.str.contains('|'.join(['a','c']))]` – jezrael Dec 02 '20 at 09:55
Hi @Jezrael - thank you- I think my data contains NaN -It gives me this error: Cannot mask with non-boolean array containing NA / NaN values - Is there a way around this? Thank you – Lynn Dec 02 '20 at 17:56
1

@Lynnette Use `df1 = df[df.id.str.contains('|'.join(['a','c']), na=False)]` – jezrael Dec 02 '20 at 18:02

Multiple groupings and aggregations within one dataframe using Python

1 Answers1