3

I am trying to get a fairly basic resampling method to work with a pandas data frame. My data frame df is indexed by datetime entries and contains prices

                               price
datetime                            
2000-08-16 09:29:55.755000  7.302786
2000-08-16 09:30:10.642000  7.304059
2000-08-16 09:30:26.598000  7.304435
2000-08-16 09:30:41.372000  7.304314
2000-08-16 09:30:56.718000  7.304334

I would like to downsample this to 5min. Using

df.resample(rule='5Min',how='last',closed='left')

takes the closest point to the left in my data of a multiple of 5min; similarly

df.resample(rule='5Min',how='first',closed='left')

takes the closes point to the right. However, I would like to take the linear interpolation between the point to the left and right instead, e.g. if my df contains the two consecutive entries

time t1, price p1
time t2, price p2

and

t1<t<t2 where t is a multiple of 5min

then the resampled dataframe should have the entry

time t, price p1+(t-t1)/(t2-t1)*(p2-p1)
chrisaycock
  • 36,470
  • 14
  • 88
  • 125
fact
  • 557
  • 6
  • 17
  • 1
    There was a [similar question yesterday](http://stackoverflow.com/q/25234941/478288), but that didn't get a good answer though. – chrisaycock Aug 11 '14 at 15:31
  • 1
    Meanwhile that other question got some more good answers, e.g. [this one with .interpolate](http://stackoverflow.com/a/39730730/1389680). – Robert Pollak Oct 30 '16 at 01:27

1 Answers1

2

try creating two separate dataframes, reset_index them (so they have the same numerical index), fillna on them, and then just do the math on df1 and df2. e.g:

df1 = df.resample(rule='5Min',how='last',closed='left').reset_index().fillna(method='ffill')
df2 = df.resample(rule='5Min',how='first',closed='left').reset_index().fillna(method='ffill')

dt = df1.datetime - df2.datetime
px_fld = df1.price + ...

something like that should do the trick.

acushner
  • 9,595
  • 1
  • 34
  • 34
  • Thanks for the answer. However, dt = df1.datetime - df2.datetime is then not defined. – fact Aug 11 '14 at 18:48
  • what do you mean by that? it's defined at `dt = df1.datetime - df2.datetime`, or do you mean something else? – acushner Aug 11 '14 at 19:18
  • 1
    Sorry what i meant is that df1.datetime gives an error since we used reset_index. However, just using df1['index'] instead works then exactly as you said. Many thanks! – fact Aug 12 '14 at 10:24
  • ah, i see. i was thinking that because the original df had 'datetime' as the index name that it would carry over. sorry about that, but glad you figured it out. – acushner Aug 12 '14 at 12:26
  • I need to do this also but since the answer is not complete I am having trouble. My attempt goes like this `val = df1.value + (df0.timestamp - df1.timestamp) / (df2.timestamp - df1.timestamp) * (df2.value - df1.value)`. I can't see a use for df1.datetime - df2.datetime. – Graeme Stuart May 06 '15 at 11:16