2

How do I organise this triple column data-set by removing the repeting elements.

Country       Year      Temperature
US            1990       25
US            1990       27 
US            1990       24
US            1991       26
Canada        1990       20
 .             .          .

Into

Country      Year        AvgTemp
US           1990           25.33
US            1991          26
Canada       1990           20

I can use groupby to do so for just the 'Year' and 'Temp' columns. But what if 3 columns are involved.

(P.S. I am new to pandas )

  • 1
    This is just: `df.groupby(['Country', 'Year'])['Temperature'].mean()` – Erfan Jun 14 '20 at 16:32
  • To match your expected output with the new column name, use named aggregations instead: `df.groupby(['Country', 'Year']).agg(AvgTemp=('Temperature', 'mean')).reset_index()` – Erfan Jun 14 '20 at 16:35

2 Answers2

1

You can use multiple variables inside groupby() like this

df.groupby(['Country','Year'])['Temp'].mean().reset_index()
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
DataVizPyR
  • 127
  • 1
  • 5
1
df.groupby(['Country', 'Year']).mean().reset_index().rename(columns={'Temperature':'AvgTemp'})
warped
  • 8,947
  • 3
  • 22
  • 49