4

I have a pandas dataframe df, which has GDP values alsong with yyyy-mm TimePeriod index.

import numpy as np
import pandas as pd
import pandas_datareader.data as web
gdp = web.DataReader("GDP", "fred", start, end).resample('M').mean().interpolate(method='linear').round().to_period('M')

Date        GDP
2015-07    16528.0
2015-08    16534.0
2015-09    16541.0
2015-10    16548.0
2015-11    16556.0
2015-12    16564.0
2016-01    16572.0
2016-02    16602.0
2016-03    16633.0
2016-04    16664.0
2016-05    16702.0
2016-06    16740.0
2016-07    16778.0
2016-08    16803.0
2016-09    16827.0
2016-10    16851.0
2016-11    16869.0
2016-12    16886.0
2017-01    16903.0
2017-02    16946.0
2017-03    16988.0
2017-04    17031.0
2017-05    17075.0
2017-06    17120.0
2017-07    17164.0
2017-08        NaN
2017-09        NaN
2017-10        NaN
2017-11        NaN
2017-12        NaN

GDP is published quarterly. The latest data point is 2017 Q3. So I resampled to have monthly values and interpolated when the values were missing. How do I extrapolate to fill up the remaining of the NaN for the rest of the year by using spline or 3 month moving average etc? I have seen some examples using polynomial, but that look like overdoing stuff (pandas extrapolation of polynomial). I was wondering if there is a simpler approach. Thank you!

Prabhakar
  • 1,138
  • 2
  • 14
  • 30
Zenvega
  • 1,974
  • 9
  • 28
  • 45

1 Answers1

4

By using interpolate

df.GDP=df.GDP.interpolate(method='spline', order=2)

df
Out[197]: 
       Date           GDP
0   2015-07  16528.000000
1   2015-08  16534.000000
2   2015-09  16541.000000
3   2015-10  16548.000000
4   2015-11  16556.000000
5   2015-12  16564.000000
6   2016-01  16572.000000
7   2016-02  16602.000000
8   2016-03  16633.000000
9   2016-04  16664.000000
10  2016-05  16702.000000
11  2016-06  16740.000000
12  2016-07  16778.000000
13  2016-08  16803.000000
14  2016-09  16827.000000
15  2016-10  16851.000000
16  2016-11  16869.000000
17  2016-12  16886.000000
18  2017-01  16903.000000
19  2017-02  16946.000000
20  2017-03  16988.000000
21  2017-04  17031.000000
22  2017-05  17075.000000
23  2017-06  17120.000000
24  2017-07  17164.000000
25  2017-08  17211.095399
26  2017-09  17258.357329
27  2017-10  17306.504998
28  2017-11  17355.538404
29  2017-12  17405.457549

Data input

df
Out[195]: 
       Date      GDP
0   2015-07  16528.0
1   2015-08  16534.0
2   2015-09  16541.0
3   2015-10  16548.0
4   2015-11  16556.0
5   2015-12  16564.0
6   2016-01  16572.0
7   2016-02  16602.0
8   2016-03  16633.0
9   2016-04  16664.0
10  2016-05  16702.0
11  2016-06  16740.0
12  2016-07  16778.0
13  2016-08  16803.0
14  2016-09  16827.0
15  2016-10  16851.0
16  2016-11  16869.0
17  2016-12  16886.0
18  2017-01  16903.0
19  2017-02  16946.0
20  2017-03  16988.0
21  2017-04  17031.0
22  2017-05  17075.0
23  2017-06  17120.0
24  2017-07  17164.0
25  2017-08      NaN
26  2017-09      NaN
27  2017-10      NaN
28  2017-11      NaN
29  2017-12      NaN
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 1
    But, I already tried interpolate in my above code. It doesn't extrapolate to the '2017-12'. It stops at '2017-07'. – Zenvega Dec 28 '17 at 02:44
  • @Zenvega see the order=2 , do you add that ? – BENY Dec 28 '17 at 02:44
  • @Zenvega this is what I get base on your gdp dataset – BENY Dec 28 '17 at 02:52
  • @Zenvega adding the data input and do not forget assign it back – BENY Dec 28 '17 at 02:55
  • I am getting the error: {TypeError: float() argument must be a string or a number, not 'Period'} – Zenvega Dec 28 '17 at 03:33
  • @Zenvega I am confused , Do you using the data you show to us ? I am using the data `gdp = web.DataReader("GDP", "fred", start, end).resample('M').mean().interpolate(method='linear').round().to_period('M') `, then do the interpolate again – BENY Dec 28 '17 at 03:50
  • I have found work around, I think it is because of the way my data is set up. I will give it a try another time to get it to work the way I want. I appreciate your help! thanks! – Zenvega Dec 28 '17 at 17:21