0

I have such a sample with student's score and population of the score:

# Create the DataFrame
sample = pd.DataFrame(
{'score':[595, 594, 593, 592, 591, 590, 589, 588, 587, 586, 585, 584, 583,582, 581, 580, 579, 578, 577, 576], 
'population':[ 705,  745,  716,  742,  722,  746,  796,  750,  816,  809,  815,821,  820,  865,  876,  886,  947,  949, 1018,  967]})

The I calculate it's weigthed average of scores:

np.average(sample['score'], weights=sample['population'])
# 584.9062443219672

However, when I run sample.describe(), it not get weights considered:

sample.describe()

           score   population
count   20.00000    20.000000
mean   585.50000   825.550000
std      5.91608    91.465539
min    576.00000   705.000000
25%    580.75000   745.750000
50%    585.50000   815.500000
75%    590.25000   878.500000
max    595.00000  1018.000000

How could get weights included in sample.describe()?

AbstProcDo
  • 19,953
  • 19
  • 81
  • 138
  • Maybe you can find the answer is this post https://stackoverflow.com/a/47368071/17931594 – Tessa Jul 13 '23 at 10:21

1 Answers1

1

You need custom function, because ouput is scalar get same values in all columns:

def describe(df, stats):
    d = df.describe()
    d.loc[stats] = np.average(df['score'], weights=df['population'])
    return d

out = describe(sample, 'wa')
print (out)
            score   population
count   20.000000    20.000000
mean   585.500000   825.550000
std      5.916080    91.465539
min    576.000000   705.000000
25%    580.750000   745.750000
50%    585.500000   815.500000
75%    590.250000   878.500000
max    595.000000  1018.000000
wa     584.906244   584.906244
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252