-1

So I have a dataframe that contains data for experiments with different hyper parameters and a special value called repeat_id which we have to run in order to find statistical significance. I'm basically trying to compute the average of any other recorded value over the different repeat_id. This is different that the standard GroupBy.mean which computes mean within a group, here I need something like mean across groups. Example:

   repeat_id variant  measuerment_0 measuerment_1 ... measuerment_n
0  0         'A'      0.0           1.0               2.0
1  1         'A'      0.2           0.4               0.6
2  0         'B'      0.1           1.1               2.1
3  1         'B'      0.3           0.5               0.7

Expected output:

   variant  measuerment_0 measuerment_1 ... measuerment_n
0  'A'      0.1           0.7               1.3
1  'B'      0.2           0.8               1.4
Alex Botev
  • 1,369
  • 2
  • 19
  • 34
  • 3
    Am I missing something? it seems like `df.groupby('variant').mean()` (just ignoring the repeat_id column) – ALollz Jul 21 '21 at 14:57
  • Does this answer your question? [Pandas sum by groupby, but exclude certain columns](https://stackoverflow.com/questions/32751229/pandas-sum-by-groupby-but-exclude-certain-columns) – Alex Jul 21 '21 at 15:04
  • So something like: `df.groupby("variant")[[c for c in df.columns if c.startswith("measurement")]].mean()` – Alex Jul 21 '21 at 15:13
  • I guess I tried to do a simplified version, but there are a lot more "columns" which determine a single experiment. I guess I can group by all possible of those. – Alex Botev Jul 21 '21 at 16:19

1 Answers1

0

You can also do this with the pivot_table() function.

df.pivot_table(index='variant',aggfunc='mean').drop(columns='repeat_id')
scotscotmcc
  • 2,719
  • 1
  • 6
  • 29