0

please, I can't understand what this function does. here is the code context:

    #group outcomes into bins of similar probability
    bins = np.linspace(0, 1, 20)
    cuts = pd.cut(prob, bins)
    print(cuts)
    binwidth = bins[1] - bins[0]

    #freshness ratio and number of examples in each bin
    cal = data.groupby(cuts).outcome.agg(['mean', 'count'])
    print(cal['count'])
    print(cal['mean'])
    cal['pmid'] = (bins[:-1] + bins[1:]) / 2
    cal['sig'] = np.sqrt(cal.pmid * (1 - cal.pmid) / cal['count'])

    #the calibration plot
    ax = plt.subplot2grid((3, 1), (0, 0), rowspan=2)
    p = plt.errorbar(cal.pmid, cal['mean'], cal['sig'])
    plt.plot(cal.pmid, cal.pmid, linestyle='--', lw=1, color='k')
    plt.ylabel("Empirical Fraction")
Munawir
  • 3,346
  • 9
  • 33
  • 51

1 Answers1

0

data is a DataFrame containing a column named outcome. The salient part of your code is:

cal = data.groupby(cuts).outcome.agg(['mean', 'count'])

What this does is, in order:

  1. Group your data based on the entries in the "cuts" column (further reference).
  2. Fetch the SeriesGroupBy corresponding with the "outcome" column.
  3. Create a DataFrame with two columns, "mean" and "count", applied to each group in your SeriesGroupBy (see e.g. here).
  4. Assign that to the cal variable.
Community
  • 1
  • 1
Aleksey Bilogur
  • 3,686
  • 3
  • 30
  • 57