My data analysis repeatedly falls back on a simple but iffy motif, namely "groupby everything except". Take this multi-index example, df
:
accuracy velocity
name condition trial
john a 1 -1.403105 0.419850
2 -0.879487 0.141615
b 1 0.880945 1.951347
2 0.103741 0.015548
hans a 1 1.425816 2.556959
2 -0.117703 0.595807
b 1 -1.136137 0.001417
2 0.082444 -1.184703
What I want to do now, for instance, is averaging over all available trials while retaining info about names and conditions. This is easily achieved:
average = df.groupby(level=('name', 'condition')).mean()
Under real-world conditions, however, there's a lot more metadata stored in the multi-index. The index easily spans 8-10 columns per row. So the pattern above becomes quite unwieldy. Ultimately, I'm looking for a "discard" operation; I want to perform an operation that throws out or reduces a single index column. In the case above, that's trial number.
Should I just bite the bullet or is there a more idiomatic way of going about this? This might well be an anti-pattern! I want to build a decent intuition when it comes to the "true pandas way"... Thanks in advance.