Using df.agg(...) on a list of functions fails when a custom function is included in a list

Question

I am looking to aggregate a bunch of columns with two functions each: np.mean and quart_1. All columns are numeric. np.mean is imported from numpy, and quart_1 is a custom function that returns the first quartile of a column:

def quart_1(x):
    return np.percentile(x, 25)

The issue is that if I do df.agg([np.mean, quart_1]), I get KeyError: no results.

It appears that the error is that quart_1 returns the original DataFrame without any aggregation (with some labels) if I include it in a list, but actually performs the aggregation and returns a Pandas series (see below commands and outputs) if I feed it the raw function without a list.

Feeding it quart_1 as a scalar:

df.select_dtypes([np.number]).agg(quart_1)

accelerometer_x     0.445186
accelerometer_y    10.619320
dtype: float64

...and inside of a list:

df.select_dtypes([np.number]).agg([quart_1])

                            accelerometer_x accelerometer_y
                                quart_1         quart_1
gps_time                                               
2017-07-27 18:35:14.660        0.519700       10.703300
2017-07-27 18:35:14.665        0.474200       10.684200
2017-07-27 18:35:14.670        0.474200       10.684200
2017-07-27 18:35:14.675        0.574800       10.633900
2017-07-27 18:35:14.680        0.574800       10.633900
2017-07-27 18:35:14.685        0.528103       10.657099
2017-07-27 18:35:14.690        0.476600       10.681800
2017-07-27 18:35:14.695        0.446749       10.694255
2017-07-27 18:35:14.700        0.476600       10.681800
2017-07-27 18:35:14.705        0.574800       10.643500
2017-07-27 18:35:14.710        0.574800       10.643500

DataFrame.agg is relatively new so I guess this is a bug. Earlier, we were using this hack when there were no df.agg: `df.groupby(np.ones(len(df))).agg([np.mean, quart_1])` — ayhan, Aug 04 '17 at 23:55

score 0 · Answer 1 · answered Aug 04 '17 at 23:58

Turns out that for some reason, when passed within a list, the dimension of the input vector changes. Replacing x with x.T within quart_1 did the trick. I have additional percentiles not mentioned here, and it seems like using quart_1 and quart_3 (same as quart_1, but with 25 replaced with 75) led to some kind of a name clash. I fixed it by using a percentile function as mentioned in this answer.

Using df.agg(...) on a list of functions fails when a custom function is included in a list

1 Answers1