Get mean per column per partition in Pandas

Question

I am trying to get a mean value per column per partition for a DataFrame such as this one:

  country      city  sales  stock
0      UK    London      1     34
1      UK     Leeds      2     20
2      UK     Leeds      3     21
3      RO      Cluj      4     24
4      RO      Cluj      5     25
5      RO Bucharest      6     25

That is, I want to get an average of both sales and stock, and aggregating these into unique combinations of country and city. The resulting DataFrame should therefore be:

  country      city  sales  stock
0      UK    London      1     34
1      UK     Leeds    2.5   20.5
2      RO      Cluj    4.5   24.5
3      RO Bucharest      6     25

where duplicate rows of my country-city partition have been aggregated into a single row, with mean values.

I studied the documentation on pandas.DataFrame.mean() and SO questions & answers such as this one, but none help me in a straightforward manner. Any help appreciated.

Try this? https://stackoverflow.com/questions/46431243/pandas-dataframe-groupby-how-to-get-sum-of-multiple-columns — Vaishali, Jun 11 '18 at 15:18

score 3 · Accepted Answer · edited Nov 13 '21 at 09:58

3

groupby

df.groupby(['country', 'city']).mean()

                   sales  stock
country city                   
RO      Bucharest    6.0   25.0
        Cluj         4.5   24.5
UK      Leeds        2.5   20.5
        London       1.0   34.0

Setting the index

df.set_index(['country', 'city']).mean(level=[0, 1])

Without setting the index

df.groupby(['country', 'city'], as_index=False, sort=False).mean()


  country       city  sales  stock
0      UK     London    1.0   34.0
1      UK      Leeds    2.5   20.5
2      RO       Cluj    4.5   24.5
3      RO  Bucharest    6.0   25.0

edited Nov 13 '21 at 09:58

Bogdan Doicin

2,342
5
25
34

answered Jun 11 '18 at 15:17

piRSquared

285,575
57
475
624

2

To be precise, `df.groupby(['country', 'city'], as_index=False, sort=False).mean()` – cs95 Jun 11 '18 at 15:18

Get mean per column per partition in Pandas

1 Answers1