First of all, my apologies if the title was too ambiguous.
I have a pd.DataFrame
with datetime64
as a dtype of index. These indices, however, are not equally spaced: they mostly have a one-minute interval, but there are often some other intervals, such as two-minutes.
Suppose if I have a pd.DataFrame
:
df = pd.DataFrame({'date': ['2018-11-28 13:59:00', '2018-11-28 14:00:00',
'2018-11-28 14:01:00', '2018-11-28 14:02:00',
'2018-11-28 14:03:00', '2018-11-28 14:05:00',
'2018-11-28 14:06:00', '2018-11-28 14:07:00',
'2018-11-28 14:08:00', '2018-11-28 14:09:00'],
'count': np.random.randint(1, 100, 10)})
datetime_index = pd.to_datetime(df['date'])
df = df.set_index(datetime_index).drop('date', 1)
df.sort_index(inplace=True)
such that df
is:
count
date
2018-11-28 13:59:00 14
2018-11-28 14:00:00 30
2018-11-28 14:01:00 2
2018-11-28 14:02:00 42
2018-11-28 14:03:00 51<<< two minutes gap
2018-11-28 14:05:00 41<<< unlike others
2018-11-28 14:06:00 48
2018-11-28 14:07:00 4
2018-11-28 14:08:00 50
2018-11-28 14:09:00 93
My goal is to divide the df
into multiple chunks where each chunk have a consistent frequency of one-minute. Thus, the expected result from above would become:
#df0
count
date
2018-11-28 13:59:00 14
2018-11-28 14:00:00 30
2018-11-28 14:01:00 2
2018-11-28 14:02:00 42
2018-11-28 14:03:00 51
#df1
count
date
2018-11-28 14:05:00 41
2018-11-28 14:06:00 48
2018-11-28 14:07:00 4
2018-11-28 14:08:00 50
2018-11-28 14:09:00 93
I have tried Split a series on time gaps in pandas? which sadly was quite outdated and did not serve my purpose.
I did actually achieved what I want for the above sample, but the actual dataframe is much larger and has many more gaps, which make following solution extremely inefficient:
df['diff'] = pd.Series(df.index).diff().values
dif = pd.Series(df.index).diff()
gap_index = dif[dif == pd.to_timedelta(120000000000)].index[0]
df[:gap_index], df[gap_index:]
I would very much appreciate any insight on this issue