How about .resample()
?
#first loading your data
>>> import pandas as pd
>>>
>>> df = pd.read_csv('dates.csv', index_col='timestamp', parse_dates=True)
>>> df.head()
name age
timestamp
2020-03-01 00:00:01 nick NaN
2020-03-01 00:00:01 john NaN
2020-03-01 00:00:02 nick NaN
2020-03-01 00:00:02 john NaN
2020-03-01 00:00:04 peter NaN
#resampling it at a frequency of 2 seconds
>>> resampled = df.resample('2s')
>>> type(resampled)
<class 'pandas.core.resample.DatetimeIndexResampler'>
#iterating over the resampler object and storing the sliced dfs in a dictionary
>>> df_dict = {}
>>> for i, (timestamp,df) in enumerate(resampled):
>>> df_dict[i] = df
>>> df_dict[0]
name age
timestamp
2020-03-01 00:00:01 nick NaN
2020-03-01 00:00:01 john NaN
Now for some explanation...
resample()
is great for rebinning DataFrames
based on time (I use it often for downsampling time series data), but it can be used simply to cut up the DataFrame
, as you want to do. Iterating over the resampler
object produced by df.resample()
returns a tuple of (name of the bin
,df corresponding to that bin
): e.g. the first tuple is (timestamp of the first second,data corresponding to the first 2 seconds). So to get the DataFrame
s out, we can loop over this object and store them somewhere, like a dict
.
Note that this will produce every 2-second interval from the start to the end of the data, so many will be empty given your data. But you can add a step to filter those out if needed.
Additionally, you could manually assign each sliced DataFrame
to a variable, but this would be cumbersome (you would probably need to write a line for each 2 second bin, rather than a single small loop). Rather with a dictionary
, you can still associate each DataFrame
with a callable name. You could also use an OrderedDict
or list
or whatever collection.
A couple points on your script:
- setting
freq
to "0.2T" is 12 seconds (.2 *60
); you can rather
do freq="2s"
- The example
df
and df2
are "out of phase," by that I mean one is binned in 2 seconds starting on odd numbers (1-2 seconds), while one is starting on evens (4-5 seconds). So the date_range
you mentioned wouldn't create those bins, it would create dfs
from either 0-1s, 2-3s, 4-5s... OR 1-2s,3-4s,5-6s,... depending on which timestamp it started on.
For the latter point, you can use the base
argument of .resample()
to set the "phase" of the resampling. So in the case above, base=0
would start bins on even numbers, and base=1
would start bins on odds.
This is assuming you are okay with that type of binning - if you really want 1-2 seconds and 4-5 seconds to be in different bins, you would have to do something more complicated I believe.