Pandas DataFrame compute average across groups

Question

So I have a dataframe that contains data for experiments with different hyper parameters and a special value called repeat_id which we have to run in order to find statistical significance. I'm basically trying to compute the average of any other recorded value over the different repeat_id. This is different that the standard GroupBy.mean which computes mean within a group, here I need something like mean across groups. Example:

   repeat_id variant  measuerment_0 measuerment_1 ... measuerment_n
0  0         'A'      0.0           1.0               2.0
1  1         'A'      0.2           0.4               0.6
2  0         'B'      0.1           1.1               2.1
3  1         'B'      0.3           0.5               0.7

Expected output:

   variant  measuerment_0 measuerment_1 ... measuerment_n
0  'A'      0.1           0.7               1.3
1  'B'      0.2           0.8               1.4

Am I missing something? it seems like `df.groupby('variant').mean()` (just ignoring the repeat_id column) — ALollz, Jul 21 '21 at 14:57
Does this answer your question? [Pandas sum by groupby, but exclude certain columns](https://stackoverflow.com/questions/32751229/pandas-sum-by-groupby-but-exclude-certain-columns) — Alex, Jul 21 '21 at 15:04
So something like: `df.groupby("variant")[[c for c in df.columns if c.startswith("measurement")]].mean()` — Alex, Jul 21 '21 at 15:13
I guess I tried to do a simplified version, but there are a lot more "columns" which determine a single experiment. I guess I can group by all possible of those. — Alex Botev, Jul 21 '21 at 16:19

score 0 · Answer 1 · answered Jul 21 '21 at 16:16

0

You can also do this with the pivot_table() function.

df.pivot_table(index='variant',aggfunc='mean').drop(columns='repeat_id')

answered Jul 21 '21 at 16:16

scotscotmcc

2,719
1
6
29

Pandas DataFrame compute average across groups

1 Answers1