0

I have the following array

array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])

and would like to apply two thresholds, such that all values below -1.0 are set to 1 and all values above -0.3 are set to 0. For the values inbetween, the following rule should apply: if the last value was below -1.0 then it should be a 1 but if the last value was above -0.3, then it should be a 0.

For the example array above, the output should be

target = np.array([0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0])

If multiple consecutive values are between -1.0 and -0.3, then it should go back as far as required until there is a value above or below the two thresholds and set the output accordingly.

I tried to achieve this by iterating over the array and using a while inside the for loop to find the next occurence where the value is above the threshold, but it doesn't work:

array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])

p = []

def function(array, p):
    for i in np.nditer(array):
       if i < -1: 
          while i <= -0.3:
            p.append(1)
            i += 1
          else:
            p.append(0)
            i += 1
    return p

a = function(array, p)
print(a)

How can I apply the two thresholds to my array as described above?

hbaderts
  • 14,136
  • 4
  • 41
  • 48

1 Answers1

2

What you are trying to achieve is called "thresholding with hysteresis". For this, I adapted the very nice algorithm from this answer:

Given your test data,

import numpy as np
array = np.array([-0.5, -2, -1, -0.5, -0.25, 0, 0, -2, -1, 0.25, 0.5, 1, 2])

you detect which values are below the first threshold -1.0, and which are above the second threshold -0.3:

low_values = array <= -1.0
high_values = array >= -0.3

These are the values for which you know the result: either 1 or 0. For all other values, it depends on its neighbors. Thus, all values for which either low_values or high_values is True are known. You can get the indices of all known elements with:

known_values = high_values | low_values
known_idx = np.nonzero(known_values)[0]

To find the result for all unknown values, we use the np.cumsum function on the known_values array. The Booleans are interpreted as 0 or 1, so this gives us the following array:

acc = np.cumsum(known_values)

which will result in the following for your example: [ 0 1 2 2 3 4 5 6 7 8 9 10 11]. Now, known_idx[acc - 1] will contain the index of the last known value for each point. With low_values[known_idx[acc - 1]] you get a True if the last known value was below -1.0 and a False if it was above -0.3:

result = low_values[known_idx[acc - 1]]

There is one problem left: If the initial value is below -1.0 or above -0.3, then everything works out perfectly fine. But if it is in-between, then it would depend on its left neighbor - which it doesn't have. So in your case, you simply define it to be zero.

We can do that by checking if acc[0] equals 0 or 1. If acc[0] = 1, then everything is fine, but if acc[0] = 0, then this means that the first value is between -1.0 and -0.3, so we have to set it to zero:

if not acc[0]:
    result[0] = False

Finally, as we were doing lots of comparisons, our result array is a boolean array. To convert it to integer 0 and 1, we simply call

result = np.int8(result)

and we get our desired result:

array([0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0], dtype=int8)
hbaderts
  • 14,136
  • 4
  • 41
  • 48
  • This solution is correct. I suppose I was thinking of the problem incorrectly, but did not know about thresholding with hysteresis. Thank you! – Christopher Goings Jun 06 '18 at 13:34