How to count consecutive periods in a timeseries above/below threshold?

Question

I have a dataset of values for a one-year period for which I want to detect and count the periods of consecutive values above/below a pre-specified threshold value. I'd like to have returned simply the length of each period of consecutive above/below-threshold values. I found code online that does almost exactly what I want to do (shown below, the function titled "fire_season_length"), except it has trouble returning the final consecutive period before the dataset ends (at the end of the year).

I believe this problem is because a period of consecutive values is only reported once the series of values flips from above (below) threshold to below (above) threshold.

Here is the function I am using to count consecutive above/below-threshold periods:

def fire_season_length(ts, threshold):

    ntot_ts = ts.count() #total number of values in ts (timeseries)
    n_gt_threshold = ts[ts >= threshold].count() #number of values greater than threshold

    type_day = 0 #below threshold
    type_day = 1 #meets or exceeds threshold

    type_prev_day = 0 #initialize first day 
    storage_n_cons_days = [[],[]]   #[[cons days above threshold], [cons days below threshold]]
    n_cons_days = 0

    for cur_day in ts: #current day in timeseries

        if cur_day >= threshold:
            type_cur_day = 1
            if type_cur_day == type_prev_day: #if same as current day
                n_cons_days += 1
            else: #if not same as current day
                storage_n_cons_days[1].append(n_cons_days)
                n_cons_days = 1
            type_prev_day = type_cur_day
        else:
            type_cur_day = 0
            if type_cur_day == type_prev_day:
                n_cons_days += 1
            else:
                storage_n_cons_days[0].append(n_cons_days)
                n_cons_days = 1
            type_prev_day = type_cur_day



    return ntot_ts, n_gt_threshold, storage_n_cons_days

And this is the output when I run a timeseries through the function; I've annotated the plot to show that there are 7 periods of consecutive values, yet the array that is returned [[13,185,30], [24, 78, 12]] (which indicates [[periods above threshold],[periods below threshold]]) only lists six such periods. It seems that period 7 is not reported in the output, which is consistent with the output from other timeseries I tested in this function as well.See annotated plot here

So my question is: how do I get my code to return the final period of consecutive values, even though the series of values has not flipped to be of the other sign (above/below threshold)?

Alain T. · Accepted Answer · 2022-01-19T17:53:22.270

You can do this using a combination of accumulate() and Counter():

import random
from itertools import accumulate
from collections import Counter

ts = [ random.randint(1,100) for _ in range(15) ]

treshold = 50
groups = accumulate([0]+[(a>=treshold) != (b>=treshold) for a,b in zip(ts,ts[1:])])
counts = sorted(Counter(groups).items())
above  = [ c for n,c in counts if (n%2==0) == (ts[0]>=treshold) ]
below  = [ c for n,c in counts if (n%2==0) != (ts[0]>=treshold) ]

print("data ",ts)
print("above",above)
print("below",below)

example output:

data  [99, 49, 84, 69, 27, 88, 35, 43, 3, 48, 80, 14, 32, 97, 78]
above [1, 2, 1, 1, 2]
below [1, 1, 4, 2]

The way this works is as follows:

First identify the positions where a change between above and below occur.
The state changes are identified by True (1) and non changing positions are False (0).
The cumulative sum of these 1s and 0s will produce series of distinct values for changes with repetitions of these values for positions with no state change.
The Counter class is then used to count how many of each repeating values occur. This corresponds to the number of consecutive states broken down by distinct state change.
Sorting the counters restores the chronological order of state changes.
Depending on the state of the first item, the even values will all correspond to either the above or below state and the odd values will be the opposite state.

[EDIT] a more straightforward approach would be to use groupby keyed on temperatures being above (True) or below (False) the threshold:

from itertools import groupby

threshold = 50
changes = [ (c,len([*g])) for c,g in groupby(ts,lambda t:(t>=threshold))]

print('above:',[n for above,n in changes if above])
print('below:',[n for above,n in changes if not above])

above [1, 2, 1, 1, 2]
below [1, 1, 4, 2]

How to count consecutive periods in a timeseries above/below threshold?

1 Answers1

Linked