1

Image of dataset

I am unsure as to how to calculate the mean for a column given specific rows. I need to calculate the mean of the column Mkt-RF by decade, as in the mean from 193001 to 193912, and so on. I need to do this for each decade until 2016.

Is there also any way to put the results into a new dataframe of its own? With the decade (1920,1930) in one column and the mean of each decade in the other?

Jordfräs
  • 8,903
  • 3
  • 27
  • 30
JSC
  • 181
  • 2
  • 12

1 Answers1

1

I think you need groupby by first 3 chars of first column by str[:3] with mean:

df = df['Mkt-RF'].groupby(df['Unnamed:0'].str[:3]).mean()

Sample:

df = pd.DataFrame({'Unnamed:0':['192607','192608','193609','193610','193611'],
                   'Mkt-RF':[4,5,6,7,5]})
print (df)   
   Mkt-RF Unnamed:0
0       4    192607
1       5    192608
2       6    193609
3       7    193610
4       5    193611

#rename column
df = df.rename(columns={'Unnamed:0':'YEARMONTH'})

df = df['Mkt-RF'].groupby(df.YEARMONTH.str[:3]).mean().rename('MEAN').reset_index()
df.YEARMONTH = (df.YEARMONTH + '0').astype(int)
print (df)
   YEARMONTH  MEAN
0       1920   4.5
1       1930   6.0

Another solution is convert first to_datetime and groupby by year floor divided by 10:

df = df.rename(columns={'Unnamed:0':'YEARMONTH'})

df.YEARMONTH = pd.to_datetime(df.YEARMONTH, format='%Y%m')
df = df['Mkt-RF'].groupby(df.YEARMONTH.dt.year // 10).mean().rename('MEAN').reset_index()
df.YEARMONTH = df.YEARMONTH *10
print (df)
   YEARMONTH  MEAN
0       1920   4.5
1       1930   6.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • It returns this error for me? "Can only use .str accessor with string values, which use np.object_ dtype in pandas" Not sure what it means. – JSC Jan 11 '17 at 07:08
  • It meenas first column is not string. So you can use `df = df['Mkt-RF'].groupby(df['Unnamed:0'].astype(str).str[:3]).mean()` or second solution. – jezrael Jan 11 '17 at 07:09
  • 2
    and small advice - people of `StackOverflow` hate pictures instead text, because text cannot be copied. So you get downvote(s). Also this [answers](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) can help as create nice pandas question. – jezrael Jan 11 '17 at 07:18
  • 1
    Thanks! The second solution works perfectly. Is there any way to add a column with the standard deviations of each decade as well? – JSC Jan 11 '17 at 07:32
  • 1
    Yes, you can use `df = df['Mkt-RF'].groupby(df.YEARMONTH.dt.year // 10).agg(['mean', 'std']).add_prefix('_').reset_index()` – jezrael Jan 11 '17 at 07:38