I have a Pandas DataFrame named df
with column named 'step', which is just an incremental counter (1,2,3,4,etc):
step col1 col2
1 2 3
2 3 5
3 1 0
4 8 9
5 2 3
I'm selecting from df
some rows of interest:
work_df = df[df[col1] < df[col2]]
step col1 col2
1 2 3
2 3 5
4 8 9
5 2 3
Now I should split work_df
to some sub_df's by continuity of 'step' (i.e. if work_df['step'] == [1,2,3,7,8,9]
then [1,2,3]
belongs to sub_df_1
while [7,8,9]
belongs to sub_df_2
etc.), currently I'm doing it this way:
for idx, row in work_df.iterrows():
if row['step'] > prev_step + 1:
if step_count > 1: #don't want to have df with only 1 row
interval_list.append({'step_count': step_count ... })
step_count = 0
else:
step_count += 1
prev_step = row['step']
I'm building new sub_df's then based on information from interval_list
. But I'm not sure this is the best way to achieve what I really need:
sub_df1=
step col1 col2
1 2 3
2 3 5
sub_df2=
step col1 col2
4 8 9
5 2 3
Are there better ways to split DataFrame by continuity of a column?