1

I have a list of values:

a = [1,3,4,5,2]

I now want the following function:

does_segment_exist(a, [1,3,4]) #True
does_segment_exist(a, [3,4,5]) #True
does_segment_exist(a, [4,5,2]) #True
does_segment_exist(a, [1,4,5]) #False
does_segment_exist(a, [1,3]) #True
does_segment_exist(a, [1,4]) #False

So the values must be found in consecutive order.

I there a clever way of doing this in Python 3?

agf
  • 171,228
  • 44
  • 289
  • 238
Baz
  • 12,713
  • 38
  • 145
  • 268
  • 1
    possible duplicate of [Check for presence of a sublist in Python](http://stackoverflow.com/questions/3313590/check-for-presence-of-a-sublist-in-python) – Fred Foo Nov 01 '11 at 15:41

3 Answers3

4

You can use a rolling window iterator, in this case one from an old version of the itertools docs:

from itertools import islice

def window(seq, n=2):
    "Returns a sliding window (of width n) over data from the iterable"
    "   s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   "
    it = iter(seq)
    result = tuple(islice(it, n))
    if len(result) == n:
        yield result
    for elem in it:
        result = result[1:] + (elem,)
        yield result

def does_segment_exist(iterable, sublist):
    return tuple(sublist) in window(iterable, len(sublist))

print(does_segment_exist([1,3,4,5,2], [3,4,5]))

If you only need it to work on lists, not any iterable, you can use:

def does_segment_exist(seq, sublist):
    # seq and sublist must both be lists
    n = len(sublist)
    return sublist in (seq[i:i+n] for i in range(len(seq) + 1 - n))

A basic implementation of the method mentioned by Raymond:

def does_segment_exist(seq, sublist):
    first = sublist[0]
    i = 0
    n = len(sublist)
    while True:
        try:
            i = seq.index(first, i)
        except ValueError:
            return False
        if sublist == seq[i:i+n]:
            return True
        i += 1

print(does_segment_exist([1,3,4,5,2], [3,4,5]))

The advantage of this method is that it doesn't have to slice for every index up to the first match, just for indexes corresponding to matches for the first value in the segment.

agf
  • 171,228
  • 44
  • 289
  • 238
1

There are many ways to do this and they are all isomorphic to substring search algorithms.

The easiest way is the naïve search using list.index() to find a common start point and then using a slice to check for a full match. If there is not a match, repeat the search until you hit the end of the list.

Raymond Hettinger
  • 216,523
  • 63
  • 388
  • 485
  • >>> help(list.index) Help on method_descriptor: index(...) L.index(value, [start, [stop]]) -> integer -- return first index of value. Raises ValueError if the value is not present. – Raymond Hettinger Nov 01 '11 at 16:10
  • Any particular reason that's not shown on the standard types page of the docs? – agf Nov 01 '11 at 16:17
  • File a bug tracker report for a doc update. It was likely overlooked (at one point, not all sequences had index()). – Raymond Hettinger Nov 01 '11 at 16:39
1

This should work with Python 2.5 and newer:

def does_segment_exist(sequence, segment):
    n, m = len(sequence), len(segment)
    return any(segment == sequence[i:i+m] for i in range(n+1-m))
Cito
  • 5,365
  • 28
  • 30
  • If you look at my second one it is the same -- short circuiting slice comparison. `in` just automatically does the `==`. – agf Nov 01 '11 at 16:13
  • Sorry, overlooked that. Yes, it's essentially the same, and there should not be noteworthy performance differences. – Cito Nov 01 '11 at 16:31
  • Actually your solution using "in" even looks more elegant and works with older Python versions. – Cito Nov 01 '11 at 16:42