Copy the following dataframe to your clipboard (select fully):
Time Group1 Value Group2
0 05 A 0.91 0.5/1
1 05 A 0.88 0.5/1
2 05 A 0.82 0.5/1
3 06 A 0.79 0.5/1
4 08 A 0.74 0.5/1
5 08 A 0.71 1
6 09 A 0.70 0.5/1
7 09 A 0.81 0.5/1
8 10 A 0.86 0.5/1
9 11 A 0.83 0.5/1
11 13 B 0.80 0.5/1
12 13 B 0.71 0.5/1
13 14 B 0.87 0.5/1
14 14 B 0.92 0.5/1
15 14 B 0.97 0.5/1
16 15 B 1.07 0.5/1
17 15 B 0.76 0.5
18 15 B 1.03 0.5/1
19 15 B NaN NaN
20 16 C 0.81 0.5
21 16 C 0.82 0.5
22 16 C 0.85 0.5
23 17 C 0.95 0.5
24 18 C 0.90 0.5
25 18 C 0.87 0.5
26 18 C 0.82 1
27 18 C 0.86 0.5
28 18 C 0.83 0.5
29 19 C 1.04 0.5/1
30 19 C 0.99 0.5/1
31 20 C 0.95 0.5/1
32 20 C 0.92 0.5/1
33 22 C 0.89 0.5/1
34 22 C 0.92 0.5/1
35 23 C 0.90 0.5/1
36 23 C 0.93 0.5/1
Now use
import pandas as pd
df=pd.read_clipboard(sep='\s\s+')
to load it into your environment.
ALGORITHMN / QUESTION
How to find every sequence of declining values in column Value
, which has at least N declining values and a total decline of >= DELTA. Multiple such sequences may exist in the Dataframe and shall be output. A sequence is interrupted, if the value in either Group1
/ Group2
column changes. If the value in Values
does not decline but stays the same, this is not counting towards reaching N rows, but does neither interrupt the sequence.
DESIRED OUTPUT:
Desired output shall be a list of sequences, for every sequence I want to know the start-time, the end-time, the start-value in check
and the end-value in check
, and also the length
, i.e. the number of declining steps.
EXAMPLES FOR THE DATAFRAME: For N = 4 and DELTA = 0.1, I would find: Time 05 with value 0.91 until Time 08 with Value 0.74. The sequence ends because the next row contains a change in Group2, even though the value would decline further. This sequence has 4 steps and a total delta of 0.91-0.74 = 0.17.
The second sequence goes from Time 19 (value 1.04) until Time 22 (value 0.89), 4 declines and a total Delta of 0.15.
SOLUTION: Here are my thoughts:
(1) Combine both Groups to a new column which could be checked for sameness against previous row (similar to this one -> Comparing previous row values in Pandas DataFrame)
df['combined'] = df['Group1'] + '___' + df['Group2']
df['same'] = df['combined'].eq(df['combined'].shift())
(2) Find whether the column Value
is declining, calculate difference:
df['decline'] = df['Value'].diff().fillna(0.0)
(3) Delete rows which have df['decline'] == 0
, i.e. just a same value, because this wont count towards reaching threshold N, so not required:
df = df[df['decline'] != 0]
(4) Combine (1) and (3), i.e. check for declining values if they are not interrupted by a change of value in Group1 or Group2:
df['is_sequence'] = (df['decline'] < 0) & df['same'] # Still fails, have to check how to make it correct...
And now check the length of the total delta of the sequences...
Any ideas?
EDIT says: Maybe some functions like .rolling() or grouping might help!? I found a similar question here about finding sequences: Python Pandas: Find a pattern in a DataFrame which uses a for-loop. You do not need to suggest something like this, because I already know how to do it in a loop. I'm asking here for a more pandas/pythonic solution if possible