I have a pandas dataframe where I want to loop over its rows and calculate a metric starting with from first row to 2nd, if not found there, check from first row to 3rd row, 4th row etc. and compare this metric with another value. I want to get the row number which the condition is first met. To give a concrete example, for a dataframe with length 30, it might be from df.iloc[0:10]
df.iloc[10:15]
and df.iloc[15:27]
, df.iloc[27:30]
, where values 10, 15, 27 are stored in a list.
An example dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100, size=(100, 1)), columns=list('A'))
df
A
0 5
1 11
2 8
3 1
4 16
5 24
some_value = 20
mylist = []
for i in range(len(df)):
for j in range(i+2, range(len(df)):
# Metric calculated on the relevant rows
metric = df.iloc[i:j]['A'].sum()
if metric >= some_value:
mylist.append(j)
break
The loop starts with df.iloc[0:2]
, and calculates 5+11, since it is not greater than the some_value (20), it passes to df.iloc[0:3]
. This time, since 5+11+8 is greater than some_value, I want to save this number (2) and don't check df.iloc[0:4]
. Then the loop should start checking again starting from df.iloc[3:5]
this time (1+16), since the condition is not met, continue with df.iloc[3:6]
(1+16+24) and so on and save the points when the condition is met.
Example output for this case is a list with values:
[2, 5]
I wrote the code above but couldn't fully achieve what I want.Could you help about this issue? Thanks.