I have just started learning python and struggling with this code. I have a dataframe which looks like shown in graph.
I want to find the first occurrence of the window in the dataframe which has certain number of values greater than threshold.
For Exmaple:
Let's say dataframe is 1000000 values in dimension. I want divide this in sliding window of 1000 and need to know whether this 1000 values alteast have 10 values which are greater than certain threshold. If first window (point 0-999) does not have the atleast 10 values greater than certain threshold, window will slide and consider values 1-1000. I have to find the index of the first occurrence of window which has atleast 10 values greater than threshold.
Also as I am dealing here streaming data, I need to stop the search when such window in dataframe occurs.
I tried this code but getting key error and not able to solve the problem.
for i in np.arange(0,len(data)-999):
for j in np.arange(0,1000):
if data[i+j]>threshold:
var_count=var_count+1
if var_count>10:
print("Anomaly has occurred")
Sample data looks like this which has around 1.8 million rows.
Sample data could looke like this
data_sample=[1,1,0,0,0,2,1,1,1,1,1,2,1,1,1,1,1,1,2,1,2,2,1,0,0,2,2,2,2,1,1,1]
data_sample=pd.DataFrame(data_sample)
threshold=1
window=5
in that I need at least 2 values which are greater than 1
which would return the index 18 as at that index my window of length 5 has at least 2 values which are greater than 1.