0

This is a follow-up question to this other question: Causal resampling: Sum over the last X <time_unit>

Say I have the following time series:

                                   money_spent
timestamp                 
2014-10-06 18:00:40.063000-04:00      0.568000
2014-10-06 18:00:41.361000-04:00      3.014770
2014-10-06 18:00:42.896000-04:00      0.878154
2014-10-06 18:00:43.040000-04:00      0.723077
2014-10-06 18:00:44.791000-04:00      0.723077
2014-10-06 18:00:45.496000-04:00      0.309539
2014-10-06 18:00:45.799000-04:00      3.032000
2014-10-06 18:00:47.470000-04:00      3.014770
2014-10-06 18:00:48.092000-04:00      1.584616

I would like to sample it:

  • At pre-defined time points (e.g. a range of timestamps every 2.5 seconds starting from 18:00 until 19:00)
  • For every sample, get the sum of spend within the interval.

Update with example

For example, assuming that I generate a set of pre-defined timestamps as follows:

# Start at 18:00
start_time = datetime.datetime(year   = 2014, 
                               month  = 10, 
                               day    = 6, 
                               hour   = 18, 
                               tzinfo = pytz.timezone('US/Eastern')

# Finish 400 seconds later
end_time    = start_time + datetime.timedelta(seconds=400)

my_new_timestamps = pd.date_range(start = start_time, 
                                  end   = end_time, 
                                  freq  = '2.5s')

I would like re-sample my original dataframe at the top of the post on the locations defined by my_new_timestamps by getting the sum of money_spent.

Note that the original dataframe only covers from ~18:00:40 until ~18:00:48, so if I do:

resample('2.5S', how='sum', label='right')

the command above will only return samples on the time-window between these two times, and not between the start and end times defined by my_new_timestamps. It would also sample on 2.5s intervals that are different from the ones I want (those defined by my_new_timestamps).

Community
  • 1
  • 1
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
  • pls post what you are expecting, as I suppose ``df.resample('2500L',how='sum')`` doesn't work for you? – Jeff Oct 08 '14 at 18:56
  • Thanks @Jeff I have updated the post. The problem with `resample` is that it only considers the time values where I have data (i.e. the time window defined by the index of the dataframe), but I want to get samples on a different set of timestamps. – Amelio Vazquez-Reina Oct 08 '14 at 19:04
  • 1
    try using ``freq='2500L'``; there is a bug with fractions in the frequency (it will make it 5s freq which is wrong). You can simply resample and it will work like you want. You reindex if you really want to (either before or after) to your full-range, but those will be NaN – Jeff Oct 08 '14 at 19:16
  • Thanks @Jeff. If I re-index beforehand, will `resample` start counting `freq` from the first timestamp in the dataframe? (if so, this fully answers the problem). – Amelio Vazquez-Reina Oct 08 '14 at 19:21
  • 1
    yes it will 'snap' it to the first freq (you can offset this using ``loffset`` if desired and/or ``label``, which counts from the biggest stamp) – Jeff Oct 08 '14 at 19:24
  • Thanks @Jeff -- That perfectly answers the question! – Amelio Vazquez-Reina Oct 08 '14 at 19:25

0 Answers0