Finding maximum index of a specific value where it occurs consecutively more than N times

Question

I have an array like this

sample = np.array([[9.99995470e-01],
                   [9.99992013e-01],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [9.99775827e-01],
                   [9.99439061e-01],
                   [9.98361528e-01],
                   [9.96853650e-01],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [1.00000000e+00],
                   [9.99999762e-01]])

I want to get the max index where the values = 1 and it occurs consecutively more than 5 times. So the output should be index no 15.

I wonder if there is a simple function to solve this

I guess you need to find `[1, 1, 1, 1, 1]` instead of just `== 1` check this [answer](https://stackoverflow.com/questions/7100242/python-numpy-first-occurrence-of-subarray) — Grijesh Chauhan, May 12 '20 at 11:33

DarrylG · Answer 1 · 2020-05-12T13:49:30.427

Using groupby

Code

import numpy as np
from itertools import groupby

def find_max_index(arr):

  # consecutive runs of ones
  # Use enumerate so we have the index with each value
  run_ones = [list(v) for k, v in groupby(enumerate(sample.flatten()), lambda x: x[1]) if k == 1]

  # Sorting by length to insure that max is at end of the list of lists
  # Since this is a stable last item will still be the largest index
  run_ones.sort(key=len) 

  last_list = run_ones[-1]
  if len(last_list) > 5:        # need max to have at least a run of five
    return last_list[-1][0]     # index of last value in max run of ones
  else:
    return None

print(find_max_index(sample))

# Output: 15

Explanation

function find_max_index

groupby keeps groups runs of ones in sublist. Each item is index, value pair (from enumerate)

run_ones = [[(2, 1.0), (3, 1.0), (4, 1.0), (5, 1.0)], [(10, 1.0), (11, 1.0), (12, 1.0), (13, 1.0), (14, 1.0), (15, 1.0)]]
Sort list to insure max is at end

run_ones: [[(2, 1.0), (3, 1.0), (4, 1.0), (5, 1.0)], [(10, 1.0), (11, 1.0), (12, 1.0), (13, 1.0), (14, 1.0), (15, 1.0)]]
Last list containing run of ones

last_list: [(10, 1.0), (11, 1.0), (12, 1.0), (13, 1.0), (14, 1.0), (15, 1.0)]
Index of the last tuple in last_list

last_list[-1][0]

It should be not just the biggest index of one but the highest index of the longest window of ones. — Dmytro, May 12 '20 at 11:56

score 1 · Answer 2 · answered May 12 '20 at 11:38

1

This should give you the index of the last appearance of 1 in a group of 5.

Input:

max([index for index, window in enumerate(windowed(sample, 5)) if list(window) == [1]*5]) + 4

Output:

answered May 12 '20 at 11:38

Noah Smith

187
4

What is `windowed`? Do you import it from somewhere? – Georgy May 12 '20 at 15:19

score 1 · Answer 3 · answered May 12 '20 at 11:41

Here is a function that will solve your problems for you

def find_repeated_index(sample, min_value, min_repeats):
  max_index = -1
  history   = []
  for index, value in enumerate(np.array(sample).flatten()):
    if value >= min_value: 
        history.append(value)
        if len(history) >= min_repeats: max_index = index
    else:
        if len(history) >= min_repeats: break                  
        history = []
  return max_index

find_repeated_index(sample, 1.0, 5)
15

find_repeated_index(sample, 1.0, 4)
5

Dmytro · Answer 4 · 2020-05-12T13:33:01.127

Here's how you can solve this with O(n) runtime complexity and without allocating extra memory (not counting flattening and to list transformation).

def find_last_index_of_longest_window(array, window_value):

    if len(array) <= 0:
        return -1

    if len(array) == 1:
        return 0 if array[0] == window_value else -1

    max_length = 0
    length = 0

    for i, value in enumerate(array):
        if value == window_value:
            length += 1
        else:
            if length >= max_length:
                max_length = length
                max_index = i - 1
                length = 0

    if length > max_length:
        max_length = length
        max_index = i

    return max_index


print(find_last_index_of_longest_window(sample.flatten().tolist(), 1.0))

UPDATE: If you want to avoid flattening and conversion to list:

def find_last_index_of_longest_window(array, window_value):

    if len(array) <= 0:
        return -1

    if len(array) == 1:
        return 0 if array[0][0] == window_value else -1

    max_length = 0
    length = 0

    for i, item in enumerate(array):
        value = item[0]
        if value == window_value:
            length += 1
        else:
            if length >= max_length:
                max_length = length
                max_index = i - 1
                length = 0

    if length > max_length:
        max_length = length
        max_index = i

    return max_index


print(find_last_index_of_longest_window(sample, 1.0))

this one works too, but how do I determine which one is the most efficient way? — Ryru Lobo, May 12 '20 at 12:24
In this case, just pass the numpy array to the function and adjust the function to take the first element of the array item (because it's array of one-item arrays) but in this case the function won't be generic and will depend on numpy array layout. — Dmytro, May 12 '20 at 13:31

score 0 · Answer 5 · answered May 12 '20 at 12:17

Based on this snippet:

def find_runs(x):
    """Find runs of consecutive items in an array."""

    # ensure array
    x = np.asanyarray(x)
    if x.ndim != 1:
        raise ValueError('only 1D array supported')
    n = x.shape[0]

    # handle empty array
    if n == 0:
        return np.array([]), np.array([]), np.array([])

    else:
        # find run starts
        loc_run_start = np.empty(n, dtype=bool)
        loc_run_start[0] = True
        np.not_equal(x[:-1], x[1:], out=loc_run_start[1:])
        run_starts = np.nonzero(loc_run_start)[0]

        # find run values
        run_values = x[loc_run_start]

        # find run lengths
        run_lengths = np.diff(np.append(run_starts, n))

        return run_values, run_starts, run_lengths

# Part added by me

values,indices,lengths = find_runs(sample.flatten())
ones = np.where(values==1)
fiveormore = np.where(lengths[ones]>=5)
r = indices[ones][fiveormore]
last_indices = r + lengths[ones][fiveormore] - 1

The last_indices variable will be an array of the last indices of each 5 or longer consecutive part of the array where the value is 1. Getting the last of these indices is just a last_indices[-1] call. If there are no such indices the array will be empty.

score 0 · Answer 6 · answered May 12 '20 at 16:07

Quick profiling for large arrays shows that the following solution based on the code from Counting consecutive 1's in NumPy array will be significantly faster than the other ones presented here:

import numpy as np


def group_cumsum(a):
    """Taken from https://stackoverflow.com/a/42129610"""
    a_ext = np.concatenate(([0], a, [0]))
    idx = np.flatnonzero(a_ext[1:] != a_ext[:-1])
    a_ext[1:][idx[1::2]] = idx[::2] - idx[1::2]
    return a_ext.cumsum()[1:-1]


array = sample[:, 0]
value = 1
n = 5

mask = array == value
cumsums = group_cumsum(mask)
if not np.any(cumsums > n):
    print(f"No more than {n} consecutive {value}'s are found in the array.")
else:
    index = len(sample) - np.argmax(cumsums[::-1] > n) - 1
    print(index)  # returns 15 for your example

Finding maximum index of a specific value where it occurs consecutively more than N times

6 Answers6