I have a pandas Dataframe with a 'data' and 'cond'(-ition) column. I need the mean value (of the data column) of the rows with the highest number of CONTINUOUS True objects in 'cond'.
Example DataFrame:
cond data
0 True 0.20
1 False 0.30
2 True 0.90
3 True 1.20
4 True 2.30
5 False 0.75
6 True 0.80
Result = 1.466, which is the mean value of row-indexes 2:4 with 3 True
I was not able to find a „vectorized“ solution with a groupby or pivot method. So I wrote a func that loops the rows. Unfortunately this takes about an hour for 1 Million lines, which is way to long. Unfortunately, the @jit decoration does not reduce the duration measurably.
The data I want to analyze is from a monitoring project over one year and I have every 3 hours a DataFrame with one Million rows. Thus, about 3000 such files.
An efficient solution would be very important. I am also very grateful for a solution in numpy.