2

for a school project I need to implement the below function:

Make a function find_intervals(s, threshold) that receives in input a Series s and a threshold value.

Find the contiguous periods when the signal is above the given threshold.

The function should return a Series that has as index the start date for each contiguous period and has as associated value the period length expressed in a number of days. The result should be sorted in descending order of period length.

When applied to a signal such as this (orange line with thershold=0):

enter image description here

it should return the following Series:

70     35
140    35
1      34
Name: interval, dtype: int64

that is, the largest interval is 35 units and it starts at label 70, then there is another interval of length 35 that starts at 140, etc. In the exercise, the index will be a date and the length of the interval is expressed in days.

I have written the following function ( with the help of this Stackoverflow answer.)

def intervals(samples,threshold):
    samples = np.array(samples)
    start = -1
    intervals = []
    for idx,x in enumerate(samples):
        if start < 0 and abs(x) < threshold:
            start = idx
        elif start >= 0 and abs(x) >= threshold:
            dur = idx-start
            if dur >= 0:
                intervals.append((start))
            start = -1
    return intervals

However, when I call this function on a similar Sin wave, the function doesn't work for the threshold value 0 or any negative values. I couldn't really figure out why.

Edit: Here's what I tried and the result I got;

With the following piece, I plotted a simple Sin wave.

x = np.arange(0,64*np.pi,1) 
y = np.sin(x/11)
df = pd.Series(data=y,index=x)
plt.plot(x,y)
df = np.array(df)

enter image description here

when I run the code with intervals(df,0.5) I get [0, 34, 69, 103, 138, 172] which is expected.

However;

If I do; intervals(df,0) I get an empty list, the same thing can be said for any negative threshold value.

Arghavan
  • 1,125
  • 1
  • 11
  • 17
  • 1
    `abs(x) >= threshold` Here, you're taking the absolute value of x. If you use a threshold of 0, then the comparison `abs(x) >= threshold` will always be true. – Nick ODell Nov 15 '20 at 00:57

1 Answers1

3

Change your function to:

def find_intervals2(samples, threshold):
    samp = samples[samples >= threshold]
    xx = samp.groupby((samp.index != samp.index.to_series().shift() + 1)
        .cumsum()).apply(lambda grp: (grp.index[0], grp.size))
    return pd.Series(xx.str[1].values, index=xx.str[0]).sort_values(ascending=False)

Note that the result is a Series not a list.

To present a more instructive example, define the source Series as:

x = np.arange(0, 68 * np.pi, dtype=int)
y2 = np.sin(x / 11 * (1000 - x) // 7 / 142)
s2 = pd.Series(data=y2, index=x)
plt.plot(s2)
plt.grid(True);

Note the "stepwise decreasing" frequency of the plot.

Then when you run find_intervals(s2, -0.2) the result is:

162    52
72     48
0      39
dtype: int64
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41