Currently I am able to retrieve an array of summary statistics from a large groupby object (for example, the groupby object has 2000 dataframes, wherein I retrieve the mean value of each dataframes 'Z' column).
To do this I use the following process:
vals = mygroupby.aggregate(np.mean)['z'].values
I am also able to do this with np.std, np.var, etc. However, I would like to do this with np.percentile (i.e. return an array of all the 90th percentiles in the groupby object), but this requires additional arguments. This is what I have tried
vals = mygroupby.aggregate(np.percentile(90))['z'].values
With the following error:
TypeError: percentile() missing 1 required positional argument: 'q'
Which I understand is because I am missing the iterable for np.percentile. How do I tell np.percentile that the iterable is the aggregate itself, similar to how np.mean works?
Edit
Performance is a concern here, and using lambda functions within the argument slows drastically, whereas the np.mean example executes very quickly.