4

Consider the following Series:

created_at
2014-01-27 21:50:05.040961    80000.00
2014-03-12 18:46:45.517968    79900.00
2014-09-05 20:54:17.991260    63605.31
2014-11-04 01:16:08.286631    64405.31
2014-11-04 01:17:26.398272    63605.31
2014-11-04 01:24:38.225306    64405.31
2014-11-13 19:32:14.273478    65205.31
Name: my_series, dtype: float64

I need to sample this Series on a specific set of pre-defined days (e.g. every day from 2014-12-01 to 2014-12-07). On each such sample, I would like to get the most recent value available from the original Series.

I have been looking at resample (see also this and this thread), since it looks like the right tool for the job. However, I don't have a good grasp of the function yet. Can resample be used for this? If so, how?

Community
  • 1
  • 1
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564

1 Answers1

4

If you first define the set of pre-defined days (days in my example below), you can reindex with that and specify the filling method ('ffill' will propagate last valid observation forward, so this means take most recent for a time series):

In [19]: s
Out[19]: 
time
2014-01-27 21:50:05.040961    80000.00
2014-03-12 18:46:45.517968    79900.00
2014-09-05 20:54:17.991260    63605.31
2014-11-04 01:16:08.286631    64405.31
2014-11-04 01:17:26.398272    63605.31
2014-11-04 01:24:38.225306    64405.31
2014-11-13 19:32:14.273478    65205.31
Name: my_series, dtype: float64

In [20]: days = pd.date_range('2014-12-01', '2014-12-07')

In [21]: s.reindex(days, method='ffill')
Out[21]: 
2014-12-01    65205.31
2014-12-02    65205.31
2014-12-03    65205.31
2014-12-04    65205.31
2014-12-05    65205.31
2014-12-06    65205.31
2014-12-07    65205.31
Freq: D, Name: my_series, dtype: float64

In this case (the example dates you gave), this gives alle the same values, as for all dates the most recent observation in the original series is the same.

If you don't want to give a specific set, but just all dates from the start to end of the original Series, you can use resample do reach the same:

In [23]: s.resample('D', how='last', fill_method='ffill')
Out[23]: 
time
2014-01-27    80000
2014-01-28    80000
2014-01-29    80000
2014-01-30    80000
...
2014-11-10    64405.31
2014-11-11    64405.31
2014-11-12    64405.31
2014-11-13    65205.31
Freq: D, Name: my_series, Length: 291
joris
  • 133,120
  • 36
  • 247
  • 202