0

I have table which I want to get average of success tests (not NULL) per day.

For example:

    f_idx    day     test_result
0       1      1       OK
1       1      1       OK
2       2      7       NULL
3       1      3       NULL
4       2      3       NULL
5       3      1       OK
6       1      3       OK

f_ix = 1, have 2 success tests on day 1 and 1 success test on day 3

f_ix = 2, have 0 success tests on day 7 and 0 success test on day 3

f_ix = 3, have 1 success tests on day 1

I expect to get result as:

    f_idx    p_res
0       1      (1+1)/2 + (1)/2
1       2      0/1 + 0/1   
2       3      1/1   

I looked at the post here: Python Pandas : group by in group by and average?

And tried the first solution of Zero, but I'm getting error: KeyError Column not found:

my code:

print(df.columns) # I can see the column name test_result)
df.groupby(['f_idx','day']).mean().groupby('f_idx')['test_result'].mean()

I'm getting error:

Caolumn not found: test_result
user3668129
  • 4,318
  • 6
  • 45
  • 87

1 Answers1

3

First aggregate mean only not missinga values in new column created by DataFrame.assign converted to numbers and then get sum per first level, here f_idx:

df1 = (df.assign(p_res=df['test_result'].notna().view('i1'))
         .groupby(['f_idx','day'])['p_res']
         .mean()
         .sum(level=0)
         .reset_index())
print (df1)
   f_idx  p_res
0      1    1.5
1      2    0.0
2      3    1.0

Solution working same like aggregation by first level with sum:

df1 = (df.assign(p_res=df['test_result'].eq('OK').view('i1'))
         .groupby(['f_idx','day'])['p_res']
         .mean()
         .groupby(level=0)
         .sum()
         .reset_index())
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252