17

I'm trying to identify if a large list has consecutive elements that are the same.

So let's say:

lst = [1, 2, 3, 4, 5, 5, 6]

And in this case, I would return true, since there are two consecutive elements lst[4] and lst[5], are the same value.

I know this could probably be done with some sort of combination of loops, but I was wondering if there were a more efficient way to do this?

Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
tzhu10
  • 345
  • 1
  • 2
  • 7
  • Loop. Singular. And I doubt it. You'd have to look at every element once to know if there wasn't a consecutive pair. – Holloway Aug 01 '16 at 21:53
  • Also see https://stackoverflow.com/questions/5389507/iterating-over-every-two-elements-in-a-list and https://stackoverflow.com/questions/33116880/how-do-i-check-if-two-consecutive-numbers-integers-in-a-list-have-the-same-val – PM 2Ring Jun 07 '17 at 13:56

8 Answers8

21

You can use itertools.groupby() and a generator expression within any() *:

>>> from itertools import groupby
>>> any(sum(1 for _ in g) > 1 for _, g in groupby(lst))
True

Or as a more Pythonic way you can use zip(), in order to check if at least there are two equal consecutive items in your list:

>>> any(i==j for i,j in zip(lst, lst[1:])) # In python-2.x,in order to avoid creating a 'list' of all pairs instead of an iterator use itertools.izip()
True

Note: The first approach is good when you want to check if there are more than 2 consecutive equal items, otherwise, in this case the second one takes the cake!


* Using sum(1 for _ in g) instead of len(list(g)) is very optimized in terms of memory use (not reading the whole list in memory at once) but the latter is slightly faster.

Mazdak
  • 105,000
  • 18
  • 159
  • 188
9

You can use a simple any condition:

lst = [1, 2, 3, 4, 5, 5, 6]
any(lst[i]==lst[i+1] for i in range(len(lst)-1))
#outputs:
True

any return True if any of the iterable elements are True

Ohad Eytan
  • 8,114
  • 1
  • 22
  • 31
4

If you're looking for an efficient way of doing this and the lists are numerical, you would probably want to use numpy and apply the diff (difference) function:

>>> numpy.diff([1,2,3,4,5,5,6])
array([1, 1, 1, 1, 0, 1])

Then to get a single result regarding whether there are any consecutive elements:

>>> numpy.any(~numpy.diff([1,2,3,4,5,5,6]).astype(bool))

This first performs the diff, inverts the answer, and then checks if any of the resulting elements are non-zero.

Similarly,

>>> 0 in numpy.diff([1, 2, 3, 4, 5, 5, 6])

also works well and is similar in speed to the np.any approach (credit for this last version to heracho).

jmetz
  • 12,144
  • 3
  • 30
  • 41
  • Wouldn't this return `True` if there were any duplicates? It sounds like the question was asking for consecutive duplicated values. – johnchase Aug 01 '16 at 21:59
  • @johnchase: No, as the `diff` works on consecutive elements (without sorting) – jmetz Aug 01 '16 at 22:00
  • Okay, I got it now. Nice solution! – johnchase Aug 01 '16 at 22:07
  • (0 in np.diff([1, 2, 3, 4, 5, 5, 6])) would do it too. – heracho Aug 14 '20 at 19:48
  • 1
    @heracho - good point - I had expected `np.any` to provide significant speed up in comparison with `in`, but it turns out that they are very similar in speed when testing with `%%timeit` in ipython! – jmetz Aug 20 '20 at 10:01
2

Here a more general numpy one-liner:

number = 7
n_consecutive = 3
arr = np.array([3, 3, 6, 5, 8, 7, 7, 7, 4, 5])
#                              ^  ^  ^ 
 
np.any(np.convolve(arr == number, v=np.ones(n_consecutive), mode='valid') 
       == n_consecutive)[0]

This method always searches the whole array, while the approach from @Kasramvd ends when the condition is first met. So which method is faster dependents on how sparse those cases of consecutive numbers are. If you are interested in the positions of the consecutive numbers, and have to look at all elements of the array this approach should be faster (for larger arrays (or/and longer sequences)).

idx = np.nonzero(np.convolve(arr==number, v=np.ones(n_consecutive), mode='valid') 
                             == n_consecutive)
# idx = i: all(arr[i:i+n_consecutive] == number)

If you are not interested in a specific value but at all consecutive numbers in general a slight variation of @jmetz's answer:

np.any(np.convolve(np.abs(np.diff(arr)), v=np.ones(n_consecutive-1), mode='valid') == 0)
#                  ^^^^^^
# EDIT see djvg's comment
scleronomic
  • 4,392
  • 1
  • 13
  • 43
  • 1
    The `any(convolve(diff(...), ...), ...)` solution yields false positives: it basically calculates a moving sum of the differences between consecutive elements, then checks if this sum is zero anywhere. However, the sum of **nonzero** differences (i.e. consecutive elements NOT equal) can also be zero. Minimal example: `arr=[1,0,1,0,1]`, `n_consecutive=3`. The issue also becomes evident if you look at the equivalent convolution kernel for the example: `[1,0,-1]` – djvg Oct 21 '21 at 08:18
  • 1
    you are right. I guess one easy fix is to take the absolute values of the difference. – scleronomic Oct 21 '21 at 18:01
1

Starting in Python 3.10, the new pairwise function provides a way to slide through pairs of consecutive elements, so that we can test the quality between consecutive elements:

from itertools import pairwise

any(x == y for (x, y) in pairwise([1, 2, 3, 4, 5, 5, 6]))
# True

The intermediate result of pairwise:

pairwise([1, 2, 3, 4, 5, 5, 6])
# [(1, 2), (2, 3), (3, 4), (4, 5), (5, 5), (5, 6)]
Xavier Guihot
  • 54,987
  • 21
  • 291
  • 190
0

A simple for loop should do it:

def check(lst):
    last = lst[0]
    for num in lst[1:]:
        if num == last:
            return True
        last = num
    return False


lst = [1, 2, 3, 4, 5, 5, 6]
print (check(lst)) #Prints True

Here, in each loop, I check if the current element is equal to the previous element.

Vaibhav Bajaj
  • 1,934
  • 16
  • 29
0

The convolution approach suggested in scleronomic's answer is very promising, especially if you're looking for more than two consecutive elements.

However, the implementation presented in that answer might not be the most efficient, because it consists of two steps: diff() followed by convolve().

Alternative implementation

If we consider that the diff() can also be calculated using convolution, we can combine the two steps into a single convolution.

The following alternative implementation only requires a single convolution of the full signal, which is advantageous if the signal has many elements.

Note that we cannot take the absolute values of the diff (to prevent false positives, as mentioned in this comment), so we add some random noise to the unit kernel instead.

# example signal
signal = numpy.array([1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0])
# minimum number of consecutive elements
n_consecutive = 3
# convolution kernel for weighted moving sum (with small random component)
rng = numpy.random.default_rng()
random_kernel = 1 + 0.01 * rng.random(n_consecutive - 1)
# convolution kernel for first-order difference (similar to numpy.diff)
diff_kernel = [1, -1]
# combine the kernels so we only need to do one convolution with the signal
combined_kernel = numpy.convolve(diff_kernel, random_kernel, mode='full')
# convolve the signal to get the moving weighted sum of differences
moving_sum_of_diffs = numpy.convolve(signal, combined_kernel, mode='valid')
# check if moving sum is zero anywhere
result = numpy.any(moving_sum_of_diffs == 0)

See the DSP guide for a detailed discussion of convolution.

Timing

The difference between the two implementations boils down to this:

def original(signal, unit_kernel):
    return numpy.convolve(numpy.abs(numpy.diff(signal)), unit_kernel, mode='valid')


def alternative(signal, combined_kernel):
    return numpy.convolve(signal, combined_kernel, mode='valid')

where unit_kernel = numpy.ones(n_consecutive - 1) and combined_kernel is defined above.

Comparison of these two functions, using timeit, shows that alternative() can be several times faster, for small kernel sizes (i.e. small value of n_consecutive). However, for large kernel sizes the advantage becomes negligible, because the convolution becomes dominant (compared to the diff).

Notes:

  • For large kernel sizes I would prefer the original two-step approach, as I think it is easier to understand.
  • Due to numerical issues it may be necessary to replace numpy.any(moving_sum_of_diffs == 0) by numpy.any(numpy.abs(moving_sum_of_diffs) < very_small_number), see e.g. here.
djvg
  • 11,722
  • 5
  • 72
  • 103
-1

My solution for this if you want to find out whether 3 consecutive values are equal to 7. For example, a tuple of intList = (7, 7, 7, 8, 9, 1):

for i in range(len(intList) - 1):
        if intList[i] == 7 and intList[i + 2] == 7 and intList[i + 1] == 7:
            return True
    return False
lloydyu24
  • 753
  • 1
  • 8
  • 17