1

I have a dataset of values for a one-year period for which I want to detect and count the periods of consecutive values above/below a pre-specified threshold value. I'd like to have returned simply the length of each period of consecutive above/below-threshold values. I found code online that does almost exactly what I want to do (shown below, the function titled "fire_season_length"), except it has trouble returning the final consecutive period before the dataset ends (at the end of the year).

I believe this problem is because a period of consecutive values is only reported once the series of values flips from above (below) threshold to below (above) threshold.

Here is the function I am using to count consecutive above/below-threshold periods:

def fire_season_length(ts, threshold):

    ntot_ts = ts.count() #total number of values in ts (timeseries)
    n_gt_threshold = ts[ts >= threshold].count() #number of values greater than threshold

    type_day = 0 #below threshold
    type_day = 1 #meets or exceeds threshold

    type_prev_day = 0 #initialize first day 
    storage_n_cons_days = [[],[]]   #[[cons days above threshold], [cons days below threshold]]
    n_cons_days = 0

    for cur_day in ts: #current day in timeseries

        if cur_day >= threshold:
            type_cur_day = 1
            if type_cur_day == type_prev_day: #if same as current day
                n_cons_days += 1
            else: #if not same as current day
                storage_n_cons_days[1].append(n_cons_days)
                n_cons_days = 1
            type_prev_day = type_cur_day
        else:
            type_cur_day = 0
            if type_cur_day == type_prev_day:
                n_cons_days += 1
            else:
                storage_n_cons_days[0].append(n_cons_days)
                n_cons_days = 1
            type_prev_day = type_cur_day



    return ntot_ts, n_gt_threshold, storage_n_cons_days

And this is the output when I run a timeseries through the function; I've annotated the plot to show that there are 7 periods of consecutive values, yet the array that is returned [[13,185,30], [24, 78, 12]] (which indicates [[periods above threshold],[periods below threshold]]) only lists six such periods. It seems that period 7 is not reported in the output, which is consistent with the output from other timeseries I tested in this function as well.See annotated plot here

So my question is: how do I get my code to return the final period of consecutive values, even though the series of values has not flipped to be of the other sign (above/below threshold)?

Ben B
  • 13
  • 3

1 Answers1

4

You can do this using a combination of accumulate() and Counter():

import random
from itertools import accumulate
from collections import Counter

ts = [ random.randint(1,100) for _ in range(15) ]

treshold = 50
groups = accumulate([0]+[(a>=treshold) != (b>=treshold) for a,b in zip(ts,ts[1:])])
counts = sorted(Counter(groups).items())
above  = [ c for n,c in counts if (n%2==0) == (ts[0]>=treshold) ]
below  = [ c for n,c in counts if (n%2==0) != (ts[0]>=treshold) ]

print("data ",ts)
print("above",above)
print("below",below)

example output:

data  [99, 49, 84, 69, 27, 88, 35, 43, 3, 48, 80, 14, 32, 97, 78]
above [1, 2, 1, 1, 2]
below [1, 1, 4, 2]

The way this works is as follows:

  • First identify the positions where a change between above and below occur.
  • The state changes are identified by True (1) and non changing positions are False (0).
  • The cumulative sum of these 1s and 0s will produce series of distinct values for changes with repetitions of these values for positions with no state change.
  • The Counter class is then used to count how many of each repeating values occur. This corresponds to the number of consecutive states broken down by distinct state change.
  • Sorting the counters restores the chronological order of state changes.
  • Depending on the state of the first item, the even values will all correspond to either the above or below state and the odd values will be the opposite state.

[EDIT] a more straightforward approach would be to use groupby keyed on temperatures being above (True) or below (False) the threshold:

from itertools import groupby

threshold = 50
changes = [ (c,len([*g])) for c,g in groupby(ts,lambda t:(t>=threshold))]

print('above:',[n for above,n in changes if above])
print('below:',[n for above,n in changes if not above])

above [1, 2, 1, 1, 2]
below [1, 1, 4, 2]
Alain T.
  • 40,517
  • 4
  • 31
  • 51