What the function data.groupby(cuts).outcome.agg does in the pandas library?

Question

please, I can't understand what this function does. here is the code context:

    #group outcomes into bins of similar probability
    bins = np.linspace(0, 1, 20)
    cuts = pd.cut(prob, bins)
    print(cuts)
    binwidth = bins[1] - bins[0]

    #freshness ratio and number of examples in each bin
    cal = data.groupby(cuts).outcome.agg(['mean', 'count'])
    print(cal['count'])
    print(cal['mean'])
    cal['pmid'] = (bins[:-1] + bins[1:]) / 2
    cal['sig'] = np.sqrt(cal.pmid * (1 - cal.pmid) / cal['count'])

    #the calibration plot
    ax = plt.subplot2grid((3, 1), (0, 0), rowspan=2)
    p = plt.errorbar(cal.pmid, cal['mean'], cal['sig'])
    plt.plot(cal.pmid, cal.pmid, linestyle='--', lw=1, color='k')
    plt.ylabel("Empirical Fraction")

Do they not provide documentation for their APIs? – takendarkk Jan 27 '17 at 20:17 — takendarkk, Jan 27 '17 at 20:17

score 0 · Answer 1 · edited May 23 '17 at 12:24

data is a DataFrame containing a column named outcome. The salient part of your code is:

cal = data.groupby(cuts).outcome.agg(['mean', 'count'])

What this does is, in order:

Group your data based on the entries in the "cuts" column (further reference).
Fetch the SeriesGroupBy corresponding with the "outcome" column.
Create a DataFrame with two columns, "mean" and "count", applied to each group in your SeriesGroupBy (see e.g. here).
Assign that to the cal variable.

What the function data.groupby(cuts).outcome.agg does in the pandas library?

1 Answers1