How to calculate the mean of a column by decade in Python

Question

I am unsure as to how to calculate the mean for a column given specific rows. I need to calculate the mean of the column Mkt-RF by decade, as in the mean from 193001 to 193912, and so on. I need to do this for each decade until 2016.

Is there also any way to put the results into a new dataframe of its own? With the decade (1920,1930) in one column and the mean of each decade in the other?

jezrael · Accepted Answer · 2017-01-11T07:08:45.067

1

I think you need groupby by first 3 chars of first column by str[:3] with mean:

df = df['Mkt-RF'].groupby(df['Unnamed:0'].str[:3]).mean()

Sample:

df = pd.DataFrame({'Unnamed:0':['192607','192608','193609','193610','193611'],
                   'Mkt-RF':[4,5,6,7,5]})
print (df)   
   Mkt-RF Unnamed:0
0       4    192607
1       5    192608
2       6    193609
3       7    193610
4       5    193611

#rename column
df = df.rename(columns={'Unnamed:0':'YEARMONTH'})

df = df['Mkt-RF'].groupby(df.YEARMONTH.str[:3]).mean().rename('MEAN').reset_index()
df.YEARMONTH = (df.YEARMONTH + '0').astype(int)
print (df)
   YEARMONTH  MEAN
0       1920   4.5
1       1930   6.0

Another solution is convert first to_datetime and groupby by year floor divided by 10:

df = df.rename(columns={'Unnamed:0':'YEARMONTH'})

df.YEARMONTH = pd.to_datetime(df.YEARMONTH, format='%Y%m')
df = df['Mkt-RF'].groupby(df.YEARMONTH.dt.year // 10).mean().rename('MEAN').reset_index()
df.YEARMONTH = df.YEARMONTH *10
print (df)
   YEARMONTH  MEAN
0       1920   4.5
1       1930   6.0

edited Jan 11 '17 at 07:08

answered Jan 11 '17 at 06:51

jezrael

822,522
95
1,334
1,252

It returns this error for me? "Can only use .str accessor with string values, which use np.object_ dtype in pandas" Not sure what it means. – JSC Jan 11 '17 at 07:08
It meenas first column is not string. So you can use `df = df['Mkt-RF'].groupby(df['Unnamed:0'].astype(str).str[:3]).mean()` or second solution. – jezrael Jan 11 '17 at 07:09
2

and small advice - people of `StackOverflow` hate pictures instead text, because text cannot be copied. So you get downvote(s). Also this [answers](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) can help as create nice pandas question. – jezrael Jan 11 '17 at 07:18
1

Thanks! The second solution works perfectly. Is there any way to add a column with the standard deviations of each decade as well? – JSC Jan 11 '17 at 07:32
1

Yes, you can use `df = df['Mkt-RF'].groupby(df.YEARMONTH.dt.year // 10).agg(['mean', 'std']).add_prefix('_').reset_index()` – jezrael Jan 11 '17 at 07:38

How to calculate the mean of a column by decade in Python

1 Answers1