1

I have a dataset, df, where I would like to:

filter the values in the 'id' column, group these values, take their average, and then sum these values

  id    use free    total   G_Used  G_Free  G_Total
   a    5   5       10      4       1       5
   b    14  6       20      5       1       6
   a    10  5       15      9       1       10
   c    6   4       10      10      10      20
   b    10  5       15      5       5       10
   b    5   5       10      1       4       5
   c    4   1       5       3       1       4

Desired Output

use   free  total   
9.5   7.5   20  
  • filter only values that contain 'a' or 'b'

  • group by each id

  • take the mean of the 'use', 'free' and 'total' columns

  • sum these values

Intermediate steps:

filter out only the a and c values

id  use free    total   G_Used  G_Free  G_Total
a   5   5       10      4       1       5
a   10  5       15      9       1       10
c   6   4       10      10      10      20
c   4   1       5       3       1       4

take mean of a

a   
    
use free    total   
7.5 5       12.5    

take mean of c

c

use free    total   
2   2.5     7.5 

sum both a and c values for final desired output

use free    total   
9.5 7.5     20  

This is what I am doing, however the syntax is not correct for some of the code. I am still researching. Any suggestion is appreciated

df1 = df[df.id = 'a' | 'b']
df2 = df1.groupby(['id'], as_index=False).agg({'use': 'mean', 'free': 'mean', 'total': 'mean'})
df3= df2.sum(['id'], axis = 0)
Lynn
  • 4,292
  • 5
  • 21
  • 44

1 Answers1

2

Use Series.isin for test membership first and then filter columns with mean, ouput is summed and converted Series to one row DataFrame by Series.to_frame and DataFrame.T for transpose:

df1 = df[df.id.isin(['a','c'])]
df2 = df1.groupby('id')[['use','free','total']].mean().sum().to_frame().T

Your solution is similar, only used GroupBy.agg:

df1 = df[df.id.isin(['a','c'])]
df2 = df1.groupby('id').agg({'use': 'mean', 'free': 'mean', 'total': 'mean'}).sum().to_frame().T

print (df2)
    use  free  total
0  12.5   7.5   20.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Hi @Jezrael, what if I have a scenario where I wish to filter if the value CONTAINS a certain word , or IS LIKE- is there something like this? I am researching- thank you – Lynn Dec 02 '20 at 09:20
  • 1
    @Lynnette - yes, then use instead `isin` this solution - [link](https://stackoverflow.com/a/26577689) – jezrael Dec 02 '20 at 09:54
  • 1
    @Lynnette - So here `df1 = df[df.id.str.contains('|'.join(['a','c']))]` – jezrael Dec 02 '20 at 09:55
  • Hi @Jezrael - thank you- I think my data contains NaN -It gives me this error: Cannot mask with non-boolean array containing NA / NaN values - Is there a way around this? Thank you – Lynn Dec 02 '20 at 17:56
  • 1
    @Lynnette Use `df1 = df[df.id.str.contains('|'.join(['a','c']), na=False)]` – jezrael Dec 02 '20 at 18:02