0

Let's say I have a time series represented in a numpy array, where every 3 seconds, I get a data point. It looks something like this (but with many more data points):

z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])

I want to find a threshold where, on average, every y seconds a data point will surpass that threshold (x).

Maybe my question would be easier to understand in this sense: let's say I've gathered some data on how many ants are leaving their mound every 3 seconds. Using this data, I want to create a threshold (x) so that in the future if the number of ants leaving at one time exceeds x, my beeper will go off. Now this is the key part - I want my beeper to go off roughly every 4 seconds. I'd like to use Python to figure out what x should be given some y amount of time based on an array of data I've already collected.

Is there a way to do this in Python?

user1566200
  • 1,826
  • 4
  • 27
  • 47
  • 1
    When you sort the time series, won't it lose its *time* information? After sorting, how can you determine if the threshold has been exceeded for any time period. – wwii Jun 19 '15 at 03:00
  • @wwii is right; it's hard to understand the logic behind your matlab approach, maybe rephrase the question? – Geotob Jun 19 '15 at 03:06
  • Sorry, I can't even decipher my MATLAB code now to figure out why exactly that worked (somebody else I work with explained it to me a while ago). I'm not entirely sure that was correct, either. So I've deleted it and simplified my question, taking out the negative threshold aspect. – user1566200 Jun 19 '15 at 03:13
  • I would be building a threshold off this time series, and then applying it to a new time series in real-time (where the data is pulled from a similar distribution as the one I make the threshold from). I want to get a 'trigger' every time the threshold is passed, and I want that trigger to happen on average every 4 seconds, data-permitting. – user1566200 Jun 19 '15 at 03:14
  • you want four-second *triggers* from data that is being sampled every three seconds? – wwii Jun 19 '15 at 03:49
  • yes; though I'm talking about on average (i realize I can't get evenly spaced 4-second triggers from 3-second sampling) – user1566200 Jun 19 '15 at 19:43

2 Answers2

2

I think it is easiest to first think about this in terms of statistics. What I think you are really saying is that you want to calculate the 100*(1-m/nth) percentile, that is the number such that the value is below it 1-m/nth of the time, where m is your sampling period and n is your desired interval. In your example, it would be the 100*(1-3/4th) percentile, or 25th percentile. That is, you want the value that is exceeded 75% of the time.

So to calculate that on your data, you should use scipy.stats.scoreatpercentile. So for your case you can do something like:

>>> z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])
>>> m = 3.
>>> n = 4.
>>> x = scipy.stats.scoreatpercentile(z, 100*(1-m/n))
>>> print(x)
1.05
>>> print((z>x).sum()/len(z))  # test, should be about 0.75
0.714285714286

Of course if you have a lot of values this estimate will be better.

Edit: Originally I had the percentile backwards. It should be 1-m/n, but I originally had just m/n.

TheBlackCat
  • 9,791
  • 3
  • 24
  • 31
  • Awesome thank you! Do you know how this could be done in the opposite sense (i.e. trigger only when the value is _below_ the threshold? – user1566200 Jun 19 '15 at 19:18
  • Something is backwards here - if I raise the desired interval, `n`, then `z > x` a much larger portion of the time. My threshold should get _larger_ when I raise `n` (and thus `z > x` should occur less frequently), but this does the opposite. – user1566200 Jun 19 '15 at 19:37
  • It is the same, just use less than instead of greater than when checking each new sample. – TheBlackCat Jun 19 '15 at 20:21
  • ```z``` represents 42 seconds of data. The goal is to exceed the threshold every 4 seconds on average, so there should be ~10 for ```z``` - this solution produces 4 or ~one every 10 seconds. Using 25th percentile would produce 10 triggers for ```z```. – wwii Jun 20 '15 at 03:30
  • @user1566200 Sorry, I had the percentile backwards. It should be `100*(1-m/n)`, not `100*m/n`. The answer has been fixed. – TheBlackCat Jun 20 '15 at 18:04
  • @wwii Yes, I noticed that too. I had the percentile backwards. I already fixed it in the answer. – TheBlackCat Jun 20 '15 at 18:05
  • I'm trying it with 100*(1-m/n) but when moving from m=2 to m=3, everything becomes false. – user1566200 Jun 21 '15 at 19:07
  • @user1566200 It works fine for me. Are you using Python 2.x? In python 2.x, division involving integers automatically rounds down to the next lowest integer (so `2/3 == 0`). You should always use `from __future__ import division` to get division working correctly. This is not an issue in Python 3.x. – TheBlackCat Jun 22 '15 at 16:30
1

Assuming that one second resolution for the trigger is ok...

import numpy as np
z = np.array([1, 2, 1, 2.2, 3, 4.4, 1, 1.2, 2, 3, 2.1, 1.2, 5, 0.5])
period = 3

Divide each sample point by its period (in seconds) and create an array of one second data - assumes linear distribution(?) for each sample.

y = np.array([[n]*period for n in z / period])
y = y.flatten()

Reshape the data into four second periods (lossy)

h = len(y) % 4
x = y[:-h]
w = x.reshape((4, len(x) / 4))

Find the sum of each four second period and find the minimum of these intervals

v = w.sum(axis = -1)
# use the min value of these sums
threshold = v.min() # 2.1

This gives a gross threshold for non-overlapping four-second chunks- however it only produces 6 triggers for z which represents 42 seconds of data.


Use overlapping, rolling windows to find the minimum value of the sums of each four-second window (lossless)

def rolling(a, window, step = 1):
    """

    Examples
    --------
    >>> a = np.arange(10)
    >>> print rolling(a, 3)
    [[0 1 2]
     [1 2 3]
     [2 3 4]
     [3 4 5]
     [4 5 6]
     [5 6 7]
     [6 7 8]
     [7 8 9]]
    >>> print rolling(a, 4)
    [[0 1 2 3]
     [1 2 3 4]
     [2 3 4 5]
     [3 4 5 6]
     [4 5 6 7]
     [5 6 7 8]
     [6 7 8 9]]
    >>> print rolling(a, 4, 2)
    [[0 1 2 3]
     [2 3 4 5]
     [4 5 6 7]
     [6 7 8 9]]
    >>>

    from http://stackoverflow.com/a/12498122/2823755
    """
    shape = ( (a.size-window)/step + 1   , window)
    strides = (a.itemsize*step, a.itemsize)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

t = rolling(y, 4)
s = t.sum(axis = -1)
threshold = s.min() # 1.3999999

This will produce 8 triggers for z.

wwii
  • 23,232
  • 7
  • 37
  • 77