2

I have a numpy array of N time series of length T. I want the index at which each first crosses some threshold, and a -1 or something similar if it never crosses. Take ts_array = np.randn(N, T)

np.argmax(ts_array > cutoff, axis=1) gets close, but it returns a 0 for both time series that cross the threshold at time 0, and time series that never cross.

np.where(...) and np.nonzero(...) are possibilities, but their return values would require rather gruesome handling to extract the vector in which I'm interested

This question is similar to Numpy first occurence of value greater than existing value but none of the answers there solve it.

Community
  • 1
  • 1
MHankin
  • 154
  • 12

2 Answers2

7

One liner:

(ts > c).argmax() if (ts > c).any() else -1

assuming ts = ts_array and c = cutoff

Otherwise:

Use argmax() and any()

np.random.seed([3,1415])

def xover(ts, cut):
    x = ts > cut
    return x.argmax() if x.any() else -1

ts_array = np.random.random(5).round(4)

ts_array looks like:

print ts_array, '\n'

[ 0.4449  0.4076  0.4601  0.4652  0.4627] 

Various checks:

print xover(ts_array, 0.400), '\n'

0 

print xover(ts_array, 0.460), '\n'

2 

print xover(ts_array, 0.465), '\n'

3 

print xover(ts_array, 1.000), '\n'

-1 
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks, but (as mentioned in my comment to SNygard) ts_array is a 2D structure so I'd need to map this to each row. That's doable, I'd just rather have a nice vectorized one-liner, but that may just not exist. – MHankin Jun 01 '16 at 22:49
  • @MHankin This is a vectorized solution. It could also be a one-liner. One-liner solutions are often harder to read and less efficient. For example, with my solution, I could have written `(ts > c).argmax() if (ts > c).any() else -1` given `ts = ts_array` and `c = cutoff`. `(ts > c).argmax()` is vectorized as is `(ts > c).any()`. I gain advantage by assigning `x = ts > c` so I don't have to calculate twice. I'll update the answer with the one-liner equivalent, but it is less efficient. – piRSquared Jun 01 '16 at 23:25
1

It's not too bad with np.where. I would use the following as a starting point:

ts_array = np.random.rand(10, 10)
cutoff = 0.5

# Get a list of all indices that satisfy the condition
rows, cols = np.where(ts_array > cutoff)
if len(rows) > 0:
    index = (rows[0], cols[0])
else:
    index = -1

Note that np.where returns two arrays, a list of row indices and a list of column indices. They are matched, so choosing the first one of each array will give us the first instance where the values are above the cutoff. I don't have a nice one-liner, but the handling code isn't too bad. It should be easily adaptable to your situation.

SNygard
  • 916
  • 1
  • 9
  • 21
  • Thanks, but ts_array is a 2D structure (see the example initialization I added to my question) so I'd need to map this to each row, which just compounds the ugliness, as `where(...)` on a 2D boolean array returns 2 1D vectors specifying the row and column indices. Parsing that would be gruesome. – MHankin Jun 01 '16 at 22:46
  • I changed my code to reflect the 2D `np.where` solution. If you want a list of indices that pass the threshold, you can use `list(zip(np.where(ts_array > cutoff)))`. How are you selecting which is the *first* instance? – SNygard Jun 01 '16 at 23:16