Find indexes of repeated elements in an array (Python, NumPy)

Question

Assume, I have a NumPy-array of integers, as:

[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]

I want to find the start and end indices of the array, where a value is more than x-times (say 5-times) repeated. So in the case above, it is the value 22 and 6. Start index of the repeated 22 is 3 and end-index is 8. Same for the repeatening 6. Is there a special tool in Python that is helpful? Otherwise, I would loop through the array index for index and compare the actual value with the previous.

Regards.

@Evan: I don't think that applies so well: `mode` works on values anywhere in the array, not necessarily contiguous. — Prune, Jun 27 '17 at 22:29
This question is really relevant to any sort of sequence container in python, not just numpy arrays. — Liam Bohl, Jun 27 '17 at 22:54

EFT · Accepted Answer · 2017-06-27T23:01:34.447

Using np.diff and the method given here by @WarrenWeckesser for finding runs of zeros in an array:

import numpy as np

def zero_runs(a):  # from link
    iszero = np.concatenate(([0], np.equal(a, 0).view(np.int8), [0]))
    absdiff = np.abs(np.diff(iszero))
    ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
    return ranges

a = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]

zero_runs(np.diff(a))
Out[87]: 
array([[ 3,  8],
       [15, 22]], dtype=int32)

This can then be filtered on the difference between the start & end of the run:

runs = zero_runs(np.diff(a))

runs[runs[:, 1]-runs[:, 0]>5]  # runs of 7 or more, to illustrate filter
Out[96]: array([[15, 22]], dtype=int32)

pylang · Answer 2 · 2017-09-05T19:50:57.390

Here is a solution using Python's native itertools.

Code

import itertools as it


def find_ranges(lst, n=2):
    """Return ranges for `n` or more repeated values."""
    groups = ((k, tuple(g)) for k, g in it.groupby(enumerate(lst), lambda x: x[-1]))
    repeated = (idx_g for k, idx_g in groups if len(idx_g) >=n)
    return ((sub[0][0], sub[-1][0]) for sub in repeated)

lst = [34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]    
list(find_ranges(lst, 5))
# [(3, 8), (15, 22)]

Tests

import nose.tools as nt


def test_ranges(f):
    """Verify list results identifying ranges."""
    nt.eq_(list(f([])), [])
    nt.eq_(list(f([0, 1,1,1,1,1,1, 2], 5)), [(1, 6)])
    nt.eq_(list(f([1,1,1,1,1,1, 2,2, 1, 3, 1,1,1,1,1,1], 5)), [(0, 5), (10, 15)])
    nt.eq_(list(f([1,1, 2, 1,1,1,1, 2, 1,1,1], 3)), [(3, 6), (8, 10)])    
    nt.eq_(list(f([1,1,1,1, 2, 1,1,1, 2, 1,1,1,1], 3)), [(0, 3), (5, 7), (9, 12)])

test_ranges(find_ranges)

This example captures (index, element) pairs in lst, and then groups them by element. Only repeated pairs are retained. Finally, first and last pairs are sliced, yielding (start, end) indices from each repeated group.

See also this post for finding ranges of indices using itertools.groupby.

score 1 · Answer 3 · answered Jun 27 '17 at 22:32

1

There really isn't a great short-cut for this. You can do something like:

mult = 5
for elem in val_list:
    target = [elem] * mult
    found_at = val_list.index(target)

I leave the not-found exceptions and longer sequence detection to you.

answered Jun 27 '17 at 22:32

Prune

76,765
14
60
81

score 0 · Answer 4 · answered Jun 27 '17 at 22:37

If you're looking for value repeated n times in list L, you could do something like this:

def find_repeat(value, n, L):
    look_for = [value for _ in range(n)]
    for i in range(len(L)):
        if L[i] == value and L[i:i+n] == look_for:
            return i, i+n

score 0 · Answer 5 · answered Jun 27 '17 at 22:53

Here is a relatively quick, errorless solution which also tells you how many copies were in the run. Some of this code was borrowed from KAL's solution.

# Return the start and (1-past-the-end) indices of the first instance of
# at least min_count copies of element value in container l 
def find_repeat(value, min_count, l):
  look_for = [value for _ in range(min_count)]
  for i in range(len(l)):
    count = 0
    while l[i + count] == value:
      count += 1
    if count >= min_count:
      return i, i + count

score 0 · Answer 6 · answered Jun 14 '21 at 10:54

I had a similar requirement. This is what I came up with, using only comprehension lists:

A=[34,2,3,22,22,22,22,22,22,18,90,5,-55,-19,22,6,6,6,6,6,6,6,6,23,53,1,5,-42,82]

Find unique and return their indices

_, ind = np.unique(A,return_index=True)

np.unique sorts the array, sort the indices to get the indices in the original order

ind = np.sort(ind)

ind contains the indices of the first element in the repeating group, visible by non-consecutive indices Their diff gives the number of elements in a group. Filtering using np.diff(ind)>5 shall give a boolean array with True at the starting indices of groups. The ind array contains the end indices of each group just after each True in the filtered list

Create a dict with the key as the repeating element and the values as a tuple of start and end indices of that group

rep_groups = dict((A[ind[i]], (ind[i], ind[i+1]-1)) for i,v in enumerate(np.diff(ind)>5) if v)

Find indexes of repeated elements in an array (Python, NumPy)

6 Answers6

Linked