The basic problem you are trying to solve is to implement a hysteresis (i.e. where a state depends on history).
Aside from that, the logic to capture intervals of a
and b
can be expressed using pd.cut()
.
a = df['Parameter A']
b = df['Parameter B']
cat_a = pd.cut(a, [-np.inf, 1, 1.5, 2, 3, np.inf], labels=[0,1,1.5,2,3], right=False)
cat_b = pd.cut(b, [-np.inf, 120, np.inf], labels=[0,2], right=False)
For cat_a
, we have a bin (labeled 1.5
) that corresponds to the "uncertain" zone between 1.5 and 2, where the hysteresis takes place (in that area, if the previous phase was >= 2
, use 2
, otherwise use 1
).
We use max
between cat_a
and cat_b
to establish a history-independent (tmp
) value:
tmp = pd.concat([cat_a, cat_b], axis=1).max(axis=1)
>>> df.assign(tmp=tmp)
Date and Time Parameter A Parameter B Required output tmp
0 2020-06-07 00:00 1.0 100 Phase 1 1.0
1 2020-06-07 00:01 1.0 101 Phase 1 1.0
2 2020-06-07 00:02 1.0 99 Phase 1 1.0
3 2020-06-07 00:03 1.0 102 Phase 1 1.0
4 2020-06-07 00:04 1.5 101 Phase 1 1.5
5 2020-06-07 00:05 2.0 105 Phase 2 2.0
6 2020-06-07 00:06 2.1 120 Phase 2 2.0
7 2020-06-07 00:07 2.2 125 Phase 2 2.0
8 2020-06-07 00:08 2.3 122 Phase 2 2.0
9 2020-06-07 00:09 1.6 123 Phase 2 2.0
10 2020-06-07 00:10 1.2 99 Phase 1 1.0
Now, to implement the hysteresis, we use this SO answer which uses numpy
. It is slightly adapted to include the left side of intervals:
import numpy as np
def hyst(x, th_lo, th_hi, initial = False):
hi = x >= th_hi
lo_or_hi = (x < th_lo) | hi
ind = np.nonzero(lo_or_hi)[0]
if not ind.size: # prevent index error if ind is empty
return np.zeros_like(x, dtype=bool) | initial
cnt = np.cumsum(lo_or_hi) # from 0 to len(x)
return np.where(cnt, hi[ind[cnt-1]], initial)
This returns a boolean value that indicates whether the phase should be "high" (True
) or "low" (False
). We then replace the uncertain values (1.5) with 1
or 2
depending on the hysteresis. Finally, we assign the numerical value of phase
into a string:
phase = tmp.where(tmp != 1.5, np.where(hyst(tmp.values, 1.5, 2), 2, 1))
df = df.assign(phase='Phase ' + phase.astype(int).astype(str))
>>> df
Date and Time Parameter A Parameter B Required output phase
0 2020-06-07 00:00 1.0 100 Phase 1 Phase 1
1 2020-06-07 00:01 1.0 101 Phase 1 Phase 1
2 2020-06-07 00:02 1.0 99 Phase 1 Phase 1
3 2020-06-07 00:03 1.0 102 Phase 1 Phase 1
4 2020-06-07 00:04 1.5 101 Phase 1 Phase 1
5 2020-06-07 00:05 2.0 105 Phase 2 Phase 2
6 2020-06-07 00:06 2.1 120 Phase 2 Phase 2
7 2020-06-07 00:07 2.2 125 Phase 2 Phase 2
8 2020-06-07 00:08 2.3 122 Phase 2 Phase 2
9 2020-06-07 00:09 1.6 123 Phase 2 Phase 2
10 2020-06-07 00:10 1.2 99 Phase 1 Phase 1
In summary
The full code is (in addition to the hyst()
function above):
a = df['Parameter A']
b = df['Parameter B']
cat_a = pd.cut(a, [-np.inf, 1, 1.5, 2, 3, np.inf], labels=[0,1,1.5,2,3], right=False)
cat_b = pd.cut(b, [-np.inf, 120, np.inf], labels=[0,2], right=False)
tmp = pd.concat([cat_a, cat_b], axis=1).max(axis=1)
phase = tmp.where(tmp != 1.5, np.where(hyst(tmp.values, 1.5, 2), 2, 1))
df = df.assign(tmp=tmp, phase='Phase ' + phase.astype(int).astype(str))
Hopefully, you can adapt and extend this logic for your 36-parameter case.
Another example
To better illustrate the phase transitions and the logic, here is another example:
df = pd.DataFrame([
[0, 0],
[1, 100],
[1.2, 100],
[1.5, 100],
[1.6, 100],
[2, 100],
[2.1, 100],
[1.6, 100],
[1.5, 100],
[1.4, 100],
[1.4, 120],
[1.5, 100],
[3, 100],
[1.5, 100],
[1.6, 100],
[1.4, 100],
], columns=['Parameter A', 'Parameter B'])
Running the code above, and adding tmp
to the df
for inspection, we see (with comments added by hand):
>>> df.assign(tmp=tmp, phase='Phase ' + phase.astype(int).astype(str))
Parameter A Parameter B tmp phase
0 0.0 0 0.0 Phase 0
1 1.0 100 1.0 Phase 1
2 1.2 100 1.0 Phase 1
3 1.5 100 1.5 Phase 1 # in hyst., but prev was low
4 1.6 100 1.5 Phase 1
5 2.0 100 2.0 Phase 2
6 2.1 100 2.0 Phase 2
7 1.6 100 1.5 Phase 2 # in hyst. but prev was high
8 1.5 100 1.5 Phase 2
9 1.4 100 1.0 Phase 1
10 1.4 120 2.0 Phase 2 # goes to 2 bc b >= 120
11 1.5 100 1.5 Phase 2
12 3.0 100 3.0 Phase 3
13 1.5 100 1.5 Phase 2 # note: not 3, even though prev was 3
14 1.6 100 1.5 Phase 2
15 1.4 100 1.0 Phase 1