2

I have a dataframe with 2 columns 'Date' and 'Temp':

              Temp
Date               
2017-08-31  43.8532
2017-08-30  44.1702
2017-08-29  43.7581
2017-08-28  43.8680
2017-08-25  43.5181
2017-08-24  43.9274
2017-08-23  43.6600
2017-08-22  43.5773
2017-08-21  43.7247
2017-08-18  43.7179
2017-08-17  43.7072
2017-08-16  43.3215
2017-08-15  43.3597
2017-08-14  43.3082
2017-08-11  43.5351
2017-08-10  43.6848
...         ...
31/08/2016  37.873
30/08/2016  38.6831
29/08/2016  38.7264
26/08/2016  39.1932
25/08/2016  39.1318
...         ...

I wish to create 2 new columns '1 yr diff' and '1 yr std'

'1 yr diff' is the difference in temperature between 2 dates a year apart.

'1 yr std' is the standard deviation in temperature during the past year.

Calculated by hand for the first three:

Date    Temp    1 yr diff   1 yr std
31/08/2017  43.8532 5.9802  1.914
30/08/2017  44.1702 5.4871  1.909
29/08/2017  43.7581 5.0317  1.901

How do I do this for all the rows?

user44840
  • 311
  • 2
  • 9

2 Answers2

1

Give this a shot:

one_year = pandas.offsets.Year(1)
df = (
    df.assign(diff_1yr=df['Temp'] - df['Temp'].shift(freq=one_year))
      .assign(std_1yr=df['Temp'].rolling(one_year).std())
)
Paul H
  • 65,268
  • 20
  • 159
  • 136
1
diff = df.Temp.resample('D').ffill().diff(365).rename('1 yr diff')
std = df.Temp.rolling('365D').std().rename('1 yr std')
pd.concat([df, diff, std], axis=1).dropna()

                Temp  1 yr diff  1 yr std
Date                                     
2017-06-30  0.467765  -0.421425  0.287502
2017-06-29  0.293493  -0.479262  0.288638
2017-06-28  0.089505  -0.080712  0.288987
2017-06-27  0.156842  -0.254886  0.288611
2017-06-26  0.061712  -0.286930  0.287945
2017-06-23  0.691029   0.623226  0.286886
2017-06-22  0.135094  -0.120836  0.287729
2017-06-21  0.597864  -0.286051  0.287270
2017-06-20  0.594055   0.008702  0.288266
2017-06-19  0.823582   0.509880  0.288254
...
piRSquared
  • 285,575
  • 57
  • 475
  • 624