2

I am trying to get a mean value per column per partition for a DataFrame such as this one:

  country      city  sales  stock
0      UK    London      1     34
1      UK     Leeds      2     20
2      UK     Leeds      3     21
3      RO      Cluj      4     24
4      RO      Cluj      5     25
5      RO Bucharest      6     25

That is, I want to get an average of both sales and stock, and aggregating these into unique combinations of country and city. The resulting DataFrame should therefore be:

  country      city  sales  stock
0      UK    London      1     34
1      UK     Leeds    2.5   20.5
2      RO      Cluj    4.5   24.5
3      RO Bucharest      6     25

where duplicate rows of my country-city partition have been aggregated into a single row, with mean values.

I studied the documentation on pandas.DataFrame.mean() and SO questions & answers such as this one, but none help me in a straightforward manner. Any help appreciated.

Chris
  • 1,173
  • 11
  • 19
  • Try this? https://stackoverflow.com/questions/46431243/pandas-dataframe-groupby-how-to-get-sum-of-multiple-columns – Vaishali Jun 11 '18 at 15:18

1 Answers1

3

groupby

df.groupby(['country', 'city']).mean()

                   sales  stock
country city                   
RO      Bucharest    6.0   25.0
        Cluj         4.5   24.5
UK      Leeds        2.5   20.5
        London       1.0   34.0

Setting the index

df.set_index(['country', 'city']).mean(level=[0, 1])

Without setting the index

df.groupby(['country', 'city'], as_index=False, sort=False).mean()


  country       city  sales  stock
0      UK     London    1.0   34.0
1      UK      Leeds    2.5   20.5
2      RO       Cluj    4.5   24.5
3      RO  Bucharest    6.0   25.0
Bogdan Doicin
  • 2,342
  • 5
  • 25
  • 34
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • 2
    To be precise, `df.groupby(['country', 'city'], as_index=False, sort=False).mean()` – cs95 Jun 11 '18 at 15:18