How to find a 10 year average of a column in a data frame?

Question

The data looks like this:

Date              Company                     Price EPS   CPI
0 1975-04-30      3M Co                       0     0     53.0
1 1975-04-30      Abbott Laboratories         0     0     53.0
2 1975-04-30      AbbVie Inc                  0     0     53.0
3 1975-04-30      Accenture PLC               0     0     53.0
4 1975-04-30      Activision Blizzard Inc     0     0     53.0

I want to calculate 10 year Earnings average and divide it by Price of the next year. How to do that in Python? I tried the following:-

CAPE.groupby(['Company','EPS']/10).mean()

But the output is not in the right format. Any help would be appreciated

P.S.: CAPE is the name of the data frame.

Cory Madden · Accepted Answer · 2017-08-09T16:33:25.857

I think this might help you:

In [79]: df.groupby([(df.Date.dt.year // 10 * 10), 'EPS']).mean()
Out[79]: 
          Price   CPI
Date EPS             
1970 0        0  53.0
1980 0        0  53.0
1990 0        0  53.0
2000 0        0  53.0
2010 0        0  53.0

I modified the data little bit so the years would group differently.

The data you provided wasn't really sufficient to test it, so I made some additional data that looks like this:

Date,Company,Price,EPS,CPI
1985-01-30,3M Co,0,2,56.0
1986-04-30,3M Co,0,3,93.0
1983-06-30,3M Co,0,4,18.0
1979-04-30,Abbott Laboratories,0,5,52.0
1972-03-30,Abbott Laboratories,0,6,73.0
1971-09-30,Abbott Laboratories,0,7,58.0
1995-04-30,AbbVie Inc,0,8,53.0
2015-04-30,Accenture PLC,0,9,53.0
2005-04-30,Activision Blizzard Inc,0,0,53.0

And performed the following actions:

In [83]: df=pd.DataFrame.from_csv('t.csv', index_col=None)

In [84]: df.Date = df.Date.apply(lambda x: pd.to_datetime(x))

In [85]: df.groupby([(df.Date.dt.year // 10 * 10), 'EPS']).mean()
Out[85]: 
          Price   CPI
Date EPS             
1970 5        0  52.0
     6        0  73.0
     7        0  58.0
1980 2        0  56.0
     3        0  93.0
     4        0  18.0
1990 8        0  53.0
2000 0        0  53.0
2010 9        0  53.0

Ok, assuming the same dataframe, we can set the index to Date and perform a rolling mean grouped by decade, I believe. Here's the code:

In [52]: df = df.set_index('Date')

In [53]: df.groupby(df.index.year // 10 * 10).rolling('3650d').mean()
Out[53]: 
                                 Company  Price  EPS        CPI
Date Date                                                      
1970 1979-04-30      Abbott Laboratories    0.0  5.0  52.000000
     1972-03-30      Abbott Laboratories    0.0  5.5  62.500000
     1971-09-30      Abbott Laboratories    0.0  6.0  61.000000
1980 1985-01-30                    3M Co    0.0  2.0  56.000000
     1986-04-30                    3M Co    0.0  2.5  74.500000
     1983-06-30                    3M Co    0.0  3.0  55.666667
1990 1995-04-30               AbbVie Inc    0.0  8.0  53.000000
2000 2005-04-30  Activision Blizzard Inc    0.0  0.0  53.000000
2010 2015-04-30            Accenture PLC    0.0  9.0  53.000000

Grouped by decade and company:

In [67]: df.groupby([df.index.year // 10 * 10, df.Company]).rolling('3650d').mean()
    ...: 
Out[67]: 
                                                         Company  Price  EPS  \
Date Company                 Date                                              
1970 Abbott Laboratories     1979-04-30      Abbott Laboratories    0.0  5.0   
                             1972-03-30      Abbott Laboratories    0.0  5.5   
                             1971-09-30      Abbott Laboratories    0.0  6.0   
1980 3M Co                   1985-01-30                    3M Co    0.0  2.0   
                             1986-04-30                    3M Co    0.0  2.5   
                             1983-06-30                    3M Co    0.0  3.0   
1990 AbbVie Inc              1995-04-30               AbbVie Inc    0.0  8.0   
2000 Activision Blizzard Inc 2005-04-30  Activision Blizzard Inc    0.0  0.0   
2010 Accenture PLC           2015-04-30            Accenture PLC    0.0  9.0   

                                               CPI  
Date Company                 Date                   
1970 Abbott Laboratories     1979-04-30  52.000000  
                             1972-03-30  62.500000  
                             1971-09-30  61.000000  
1980 3M Co                   1985-01-30  56.000000  
                             1986-04-30  74.500000  
                             1983-06-30  55.666667  
1990 AbbVie Inc              1995-04-30  53.000000  
2000 Activision Blizzard Inc 2005-04-30  53.000000  
2010 Accenture PLC           2015-04-30  53.000000

The 10 year averages should be for years 1990 (10 years after the data starts till 2017) and for every year in between (1991,1992,1993 .....2017) — Abhay Saini, Aug 09 '17 at 14:37
Your data starts in the 70's, not the 80's. I don't understand what you're saying. This does average the decades as requested. — Cory Madden, Aug 09 '17 at 14:47
for date jan 31 1990, the average should be for the dates jan 31, 1980-jan 31-1990, for date feb 28 1990, the average should be for feb 28 1980-feb 28 1990 and so on — Abhay Saini, Aug 09 '17 at 15:06
something like what happens in this link :- https://stackoverflow.com/questions/40060842/moving-average-pandas — Abhay Saini, Aug 09 '17 at 15:07
@AbhaySaini I believe I may understand now. Please check the last example in my answer. — Cory Madden, Aug 09 '17 at 16:23
Before trying that, can you add one small thing to your code? I actually want the 10 year averages for each company separately, like:- for activision, from 1975 to 2017, i will have 10 year averages starting from 1985 to 2017 (for each individual date). I tried this code for that (if you may wanna check it out, it gives error though) :- CAPE['Earnings_Average'] = CAPE.groupby('Company',as_index=False).apply(lambda x: x.Real_Earnings.rolling(window=120).mean()) The 120 is the number of months since the data is monthly — Abhay Saini, Aug 09 '17 at 16:27
Check my last example now. Using `window` doesn't work for me. That's all I can do for now. I do have a job :) If that doesn't work, I'll leave this here anyway so it can possibly help someone else that googles your question. — Cory Madden, Aug 09 '17 at 16:34
Thanks for your efforts, but iam getting the error :- AttributeError: 'Int64Index' object has no attribute 'year' — Abhay Saini, Aug 09 '17 at 16:37
Try to cast it as a datetime as I did in my initial steps. `df.Date = df.Date.apply(lambda x: pd.to_datetime(x))` — Cory Madden, Aug 09 '17 at 16:50
Did that! Thanks but the date came as the first value so...will have to extract only the last column from this — Abhay Saini, Aug 09 '17 at 17:30
Are you saying that this is the correct result? If you just want the last column just do: `df.groupby([df.index.year // 10 * 10, df.Company]).rolling('3650d').mean().CPI` I also had to apply `.to_pydatetime()` before setting the index. so really `df.Date = df.Date.apply(lambda x: pd.to_datetime(x).to_pydatetime())` — Cory Madden, Aug 09 '17 at 18:25

How to find a 10 year average of a column in a data frame?

1 Answers1