5

For example, something like:

>>> [1, 2, 3].contains_sequence([1, 2])
True
>>> [1, 2, 3].contains_sequence([4])
False

I know that the in operator can do this for strings:

>>> "12" in "123"
True

But I'm looking for something that operates on iterables.

David Wolever
  • 148,955
  • 89
  • 346
  • 502
  • May the sequence appear anywhere inside the other sequence? – Jon Clements Jun 21 '12 at 03:49
  • Yes — the operation I have in mind is identical to the behaviour of the `in` operator on strings, except for generic iterables. – David Wolever Jun 21 '12 at 03:51
  • Right to be sure - both sequences aren't iterables? – Jon Clements Jun 21 '12 at 04:02
  • The haystack is an iterable, but there's no sense in requiring the needle to be an iterable (it would have to be unrolled anyway, as far as I can tell). – David Wolever Jun 21 '12 at 04:05
  • 1
    Yes, but have met some insane clients :) ie, "surely you can change our 57 table database with 200 millions rows before tomorrow" :) – Jon Clements Jun 21 '12 at 04:08
  • 1
    Okay, just realised it's just after 5am and I need to get ready for work in a bit, and been on the PC all night :( So if you don't get an acceptable answer, it'll be niggling at the back of my brown, but otherwise look forward to seeing one. GL – Jon Clements Jun 21 '12 at 04:14
  • possible duplicate of [Best Way To Determine if a Sequence is in another sequence in Python](http://stackoverflow.com/questions/425604/best-way-to-determine-if-a-sequence-is-in-another-sequence-in-python) – Elias Zamaria Jan 23 '15 at 22:35

8 Answers8

4

Referenced from https://stackoverflow.com/a/6822773/24718 modified to use a list.

from itertools import islice

def window(seq, n=2):
    """
    Returns a sliding window (of width n) over data from the iterable
    s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ...                   
    """
    it = iter(seq)
    result = list(islice(it, n))
    if len(result) == n:
        yield result    
    for elem in it:
        result = result[1:] + [elem]
        yield result

def contains_sequence(all_values, seq):
    return any(seq == current_seq for current_seq in window(all_values, len(seq)))            

test_iterable = [1,2,3]
search_sequence = [1,2]

result = contains_sequence(test_iterable, search_sequence)
Community
  • 1
  • 1
monkut
  • 42,176
  • 24
  • 124
  • 155
3

Is there a Python builtin? No. You can accomplish this task in various ways. Here is a recipe that does it, and also gives you the position of the subsequence in the containing sequence:

def _search(forward, source, target, start=0, end=None):
    """Naive search for target in source."""
    m = len(source)
    n = len(target)
    if end is None:
        end = m
    else:
        end = min(end, m)
    if n == 0 or (end-start) < n:
        # target is empty, or longer than source, so obviously can't be found.
        return None
    if forward:
        x = range(start, end-n+1)
    else:
        x = range(end-n, start-1, -1)
    for i in x:
        if source[i:i+n] == target:
            return i
    return None
lvc
  • 34,233
  • 10
  • 73
  • 98
BrenBarn
  • 242,874
  • 37
  • 412
  • 384
2

As far as I know, there's no way to do this. You can roll your own function pretty easily, but I doubt that will be terribly efficient.

>>> def contains_seq(seq,subseq):
...     #try: junk=seq[:]
...     #except: seq=tuple(seq)
...     #try: junk=subseq[:]
...     #except: subseq=tuple(subseq)
...     ll=len(subseq)
...     for i in range(len(seq)-ll):  #on python2, use xrange.
...         if(seq[i:i+ll] == subseq):
...             return True
...     return False
...
>>> contains_seq(range(10),range(3)) #True
>>> contains_seq(range(10),[2,3,6]) #False

Note that this solution does not work with generator type objects (it only works on objects that you can slice). You could check seq to see if it is sliceable before proceeding and cast to a tuple if it isn't sliceable -- But then you get rid of the benefits of slicing. You could re-write it to check one element at a time instead of using slicing, but I have a feeling performance would suffer even more.

mgilson
  • 300,191
  • 65
  • 633
  • 696
  • Thanks for answering the actual question ("is there a builtin") before providing an implementation. – David Wolever Jun 21 '12 at 04:00
  • This requires both `seq` and `subseq` to be sequence-like (and not just iterable) -- i.e., `contains_seq(xrange(10), range(3))` won't work. – Edward Loper Jun 21 '12 at 04:04
  • @mgilson: don't you mean "on python 2, use xrange"? – Junuxx Jun 21 '12 at 04:13
  • @EdwardLoper -- That's true. I actually thought of that. strictly speaking, I would consider `xrange` to be an `iterable` and not a `sequence` ( http://docs.python.org/glossary.html ). Personally, I think this function should probably only support sequences for reasons I added above. – mgilson Jun 21 '12 at 04:14
  • @mgilson The question title explicitly specifies that we should search inside an iterable, and David reiterated that in the comments for the question: "The haystack is an iterable, but there's no sense in requiring the needle to be an iterable." – Edward Loper Jun 21 '12 at 04:18
  • @EdwardLoper -- Yes, I know that's what was asked for. (and I provided a blurb on how to make my function work with arbitrary iterables). However, I still think that it is either error prone or inefficient to support arbitrary iterables in this case . – mgilson Jun 21 '12 at 04:22
  • @mgilson it's true — supporting only lists is both simpler and (could be, at least) faster. However, given that I need something which operates on iterables, it is also incorrect =\ – David Wolever Jun 21 '12 at 20:54
  • @DavidWolever -- This doesn't just work on lists, it also works on tuples and strings and anything else that you can slice. I added try/except clauses to make it work with other things (by converting them to tuples). – mgilson Jun 21 '12 at 23:30
2

As others have said, there's no builtin for this. Here's an implementation that is potentially more efficient than the other answers I've seen -- in particular, it scans through the iterable, just keeping track of what prefix sizes of the target sequence it's seen. But that increased efficiency comes at some expense in increased verbosity over some of the other approaches that have been suggested.

def contains_seq(iterable, seq):
    """
    Returns true if the iterable contains the given sequence.
    """
    # The following clause is optional -- leave it if you want to allow `seq` to
    # be an arbitrary iterable; or remove it if `seq` will always be list-like.
    if not isinstance(seq, collections.Sequence):
        seq = tuple(seq)

    if len(seq)==0: return True # corner case

    partial_matches = []
    for elt in iterable:
        # Try extending each of the partial matches by adding the
        # next element, if it matches.
        partial_matches = [m+1 for m in partial_matches if elt == seq[m]]
        # Check if we should start a new partial match
        if elt==seq[0]:
            partial_matches.append(1)
        # Check if we have a complete match (partial_matches will always
        # be sorted from highest to lowest, since older partial matches 
        # come before newer ones).
        if partial_matches and partial_matches[0]==len(seq):
            return True
    # No match found.
    return False
Edward Loper
  • 15,374
  • 7
  • 43
  • 52
  • I don't think I would use the `hasattr` bit. You pick up the ability to check `sets` and maybe some user defined objects, but I wouldn't typically expect an object which doesn't support `__getitem__` to be ordered. I think it's better to force the user to explicitly cast to a different object -- although I suppose it does allow the use of generators, etc (but casts away all of their benefits)... – mgilson Jun 21 '12 at 04:07
  • The `hasattr` bit was put in to allow `seq` (the "needle") to be any arbitrary iterable, if that was desired -- e.g., `contains_seq(xrange(100), xrange(3,8))`. If it's known that the needle is list-like, then you can get rid of that. – Edward Loper Jun 21 '12 at 04:08
  • The `hasattr` check would be better spelled as: `if not isinstance(seq, collections.Sequence)` (although that has slightly different semantics, so you might also want to check against `collections.Mapping`). And it should really go *before* the check for an empty sequence. – lvc Jun 21 '12 at 04:26
2

If preserving of order is not necessary, you can use sets (builtin):

>>> set([1,2]).issubset([1,2,3])
True
>>> set([4]).issubset([1,2,3])
False

Otherwise:

def is_subsequence(sub, iterable):
    sub_pos, sub_len = 0, len(sub)
    for i in iterable:
        if i == sub[sub_pos]:
            sub_pos += 1
            if sub_pos >= sub_len:
                return True
        else:
            sub_pos = 0
    return False

>>> is_subsequence([1,2], [0,1,2,3,4])
True
>>> is_subsequence([2,1], [0,1,2,3,4]) # order preserved
False
>>> is_subsequence([1,2,4], [0,1,2,3,4])
False

This one works with any iterator.

1

deque appears to be useful here:

from collections import deque

def contains(it, seq):
    seq = deque(seq)
    deq = deque(maxlen=len(seq))
    for p in it:
        deq.append(p)
        if deq == seq:
            return True
    return False

Note that this accepts arbitrary iterables for both arguments (no slicing required).

georg
  • 211,518
  • 52
  • 313
  • 390
0

As there's no builtin, I made a nice version:

import itertools as it

def contains(seq, sub):
    seq = iter(seq)
    o = object()
    return any(all(i==j for i,j in zip(sub, it.chain((n,),seq, 
                                      (o for i in it.count())))) for n in seq)

This do not require any extra lists (if you use it.izip or Py3k).

>>> contains([1,2,3], [1,2])
True
>>> contains([1,2,3], [1,2,3])
True
>>> contains([1,2,3], [2,3])
True
>>> contains([1,2,3], [2,3,4])
False

Extra points if you have no trouble reading it. (It does the job, but the implementation is not to be taked too seriously). ;)

JBernardo
  • 32,262
  • 10
  • 90
  • 115
-2

You could convert it into a string and then do matching on it

full_list = " ".join([str(x) for x in [1, 2, 3]])
seq = " ".join([str(x) for x in [1, 2]])
seq in full_list
Yunchi
  • 5,529
  • 2
  • 17
  • 18
  • This would only work if the items can be sensibly converted to strings. – David Wolever Jun 21 '12 at 03:49
  • Yes, but the example given was for integers, which can be converted into string pretty easily. Not only that, using str.join and then the in operator is a very efficient way to do this, since str.join is an efficient operation compared to iterating the list. – Yunchi Jun 21 '12 at 03:53
  • 1
    This would also give the True for the input `['list','one','list','two']`, seaching for `['list one','list two']`. While a rare case, it's still worth mentioning. – mgilson Jun 21 '12 at 03:57
  • Yes, but the example is just an example. Both the title and the last paragraph make it clear that I'm interested in generic iterables. Additionally, the methods you've proposed are incorrect, even when the input is only integers: consider the case where I want to determine whether `[12]` contains the sequence `[1]` (or determine whether `["foobar"]` contains `["foo"]`). – David Wolever Jun 21 '12 at 03:57
  • Ah that true, I didn't think about any of those cases – Yunchi Jun 21 '12 at 04:03