How to test if a list contains another list as a contiguous subsequence?

Question

How can I test if a list contains another list (ie. it's a contiguous subsequence). Say there was a function called contains:

contains([1,2], [-1, 0, 1, 2]) # Returns [2, 3] (contains returns [start, end])
contains([1,3], [-1, 0, 1, 2]) # Returns False
contains([1, 2], [[1, 2], 3]) # Returns False
contains([[1, 2]], [[1, 2], 3]) # Returns [0, 0]

Edit:

contains([2, 1], [-1, 0, 1, 2]) # Returns False
contains([-1, 1, 2], [-1, 0, 1, 2]) # Returns False
contains([0, 1, 2], [-1, 0, 1, 2]) # Returns [1, 3]

For what it's worth, returning `[start, end+1]` is more pythonic as it looks like a slice -- `(end+1)-start` gives the length of what is found. — Andrew Jaffe, Oct 02 '10 at 21:01
This looks like a bad design - sometimes the function returns a bool, sometimes it returns a list. That makes it very hard to use since you have to check the return type before you can do anything with the result. IMHO a function called "contains" should only return True or False. — Dave Kirby, Oct 02 '10 at 21:01
It's kinda sad that lists don't have the needed functionality built-in, but strings do (`str.find`). — Jochen Ritzel, Oct 02 '10 at 21:12
Why would this, for any reason, return a list and not a tuple!? — Grant Paul, Oct 03 '10 at 00:11
related: [Best Way To Determine if a Sequence is in another sequence in Python](http://stackoverflow.com/q/425604/4279) — jfs, Mar 23 '17 at 23:20
Returning `False` is questionable since `False == 0`, and `0` is a valid `start`, so that could be confusing. Instead, I'd raise a `ValueError`. — wjandrea, May 02 '21 at 16:57
Does this answer your question? [Check for presence of a sliced list in Python](https://stackoverflow.com/questions/3313590/check-for-presence-of-a-sliced-list-in-python) — user202729, Dec 05 '21 at 18:05

score 101 · Answer 1 · edited Nov 13 '17 at 12:38

101

If all items are unique, you can use sets.

>>> items = set([-1, 0, 1, 2])
>>> set([1, 2]).issubset(items)
True
>>> set([1, 3]).issubset(items)
False

edited Nov 13 '17 at 12:38

SiHa

7,830
13
34
43

answered Oct 02 '10 at 20:22

Thomas O

6,026
12
42
60

30

that's not what op is looking for – SilentGhost Oct 02 '10 at 20:24
It would work for unique lists. In fact it would work for non-unique items, but it wouldn't be able to determine between the individual items. (so you couldn't compare between [1, 1, 2] and [1, 2].) – Thomas O Oct 02 '10 at 20:25
Okay, that's why I used the qualifier "if all items are unique." I didn't realise, as your example didn't make it clear that you needed distinction between identical items. – Thomas O Oct 02 '10 at 20:28
5

It still won't work, even if all items are unique, unless they are also in the same order, with no intermingled items. e.g. finding `[1, 3]` or `[2, 1]` in `[1, 2, 3]` will give a false positive. Assuming that we're looking for the sequence itself rather than just the values contained in the sequence. – intuited Oct 02 '10 at 20:34
Oh. Thanks for pointing that out. eumiro's answer looks better then. – Thomas O Oct 02 '10 at 20:37
@Thomas O When you post an answer that is a) wrong, b) not what OP is looking for or C) all of the above, there is a 'delete' link at the bottom left of your post. feel free to click it and 1) declutter the thread, 2) avoid exposing yourself to further -1's. – aaronasterling Oct 02 '10 at 21:51
8

I think it's better to show how an idea can be wrong (so as to avoid it in future), rather than erasing it entirely. – Thomas O Oct 02 '10 at 22:03
30

Yes, especially since this is the correct solution for some people (like me) who find this page, who do not care about sequence. – mk12 Feb 26 '12 at 05:18
2

Note to viewer: the results provided in this example is not accurate. That being said, this example helped me out. – pangyuteng Oct 10 '14 at 17:04
1

@mk12 sucks for people like me who do and the first 2 questions I've been on no one cares about the order – Boris Verkhovskiy Jul 15 '21 at 01:22
@intuited IDK what version of python did you use but with python3.10 if items are in random order everything works fine, no false positives `set([3, 1]).issubset(set([1, 2, 3, 4])) # True` – Qback Nov 10 '22 at 13:26

score 53 · Answer 2 · edited Jun 16 '20 at 19:21

53

There's an all() and any() function to do this. To check if big contains ALL elements in small

result = all(elem in big for elem in small)

To check if small contains ANY elements in big

result = any(elem in big for elem in small)

the variable result would be boolean (TRUE/FALSE).

edited Jun 16 '20 at 19:21

renan-eccel

181
10

answered May 31 '18 at 23:00

ericyan3000

603
5
6

5

If I am not mistaken `all(elem in list1 for elem in list2)` is testing all items in list2 against items in list1 so it is other way around. This checks list2 contains all items in list1 – rifle2000 Oct 08 '19 at 08:04
3

`all(elem in big for elem in small)` would return true regardless of position, which isn't what OP wants. They asked for "contiguous subsequence". – Abhijit Sarkar Jul 27 '21 at 11:05
In case anyone confused , here is example `ColumnToCheck = [1,2,3,4] ListofItems =[1,2,3,4,251,54,8,65,5,85,25] #if all items from ColumnToCheck exist in ListofItems print(all(x in ListofItems for x in ColumnToCheck))` – user1586957 Jan 20 '22 at 14:00
1

Seems to have answered the wrong question. – Jason R. Coombs Oct 13 '22 at 13:38

score 41 · Accepted Answer · edited May 02 '21 at 16:46

41

Here is my version:

def contains(small, big):
    for i in xrange(len(big)-len(small)+1):
        for j in xrange(len(small)):
            if big[i+j] != small[j]:
                break
        else:
            return i, i+len(small)
    return False

It returns a tuple of (start, end+1) since I think that is more pythonic, as Andrew Jaffe points out in his comment. It does not slice any sublists so should be reasonably efficient.

One point of interest for newbies is that it uses the else clause on the for statement - this is not something I use very often but can be invaluable in situations like this.

This is identical to finding substrings in a string, so for large lists it may be more efficient to implement something like the Boyer-Moore algorithm.

Note: If you are using Python3, change xrange to range.

edited May 02 '21 at 16:46

wjandrea

28,235
9
60
81

answered Oct 02 '10 at 21:23

Dave Kirby

25,806
5
67
84

+1 for the note about efficient string searching algorithms. One disadvantage of yours is the addition of an interpreted inner loop (the slice comparison is, I imagine, faster, although the copy might offset that). I'm going to try a performance comparison. – Tim Yates Oct 02 '10 at 21:47
1

After further tests, yours *is* the best so far for large subsequences. I would pick this, even despite the small disadvantage is has on smaller data sets. – Tim Yates Oct 02 '10 at 22:52
3

+1: Didn't know about `for`'s `else` clause! Just today I created an awkward construct involving setting a boolean to do exactly this. – mk12 Feb 26 '12 at 05:16
I think it's simpler to use slicing, similar to [martineau's answer](https://stackoverflow.com/a/3847612/4518341): `if big[i:i+len(small)] == small: return i, i+len(small)`. – wjandrea May 02 '21 at 16:50
You can save on calculating `len(small)` over and over by defining it at the top, like `n = len(small)` – wjandrea May 02 '21 at 16:54
I am surprised that there is no mention of KMP algorithm which has the best *worst time complexity* of O(n). – kaushalpranav Oct 04 '21 at 17:21
1

Amazing that there are *two* higher-voted answers that solve *completely unrelated* problems. I guess the accept button has *some* value.... – Karl Knechtel Aug 08 '22 at 01:38

score 6 · Answer 4 · answered Oct 03 '10 at 11:31

6

May I humbly suggest the Rabin-Karp algorithm if the big list is really big. The link even contains almost-usable code in almost-Python.

answered Oct 03 '10 at 11:31

9000

39,899
9
66
104

martineau · Answer 5 · 2020-11-25T23:47:31.363

5

This works and is fairly fast since it does the linear searching using the builtin list.index() method and == operator:

def contains(sub, pri):
    M, N = len(pri), len(sub)
    i, LAST = 0, M-N+1
    while True:
        try:
            found = pri.index(sub[0], i, LAST) # find first elem in sub
        except ValueError:
            return False
        if pri[found:found+N] == sub:
            return [found, found+N-1]
        else:
            i = found+1

edited Nov 25 '20 at 23:47

answered Oct 02 '10 at 21:33

martineau

119,623
25
170
301

Not sure if there is a flaw can you matineau double check. I tried print (contains((1,2,3),(1,2,3))) and I got [0,2] instead of [0,1,2] – user-asterix May 04 '20 at 17:16
@user-asterix: The function is effectively returning `[start, end]` as designed (and OP wants). – martineau May 04 '20 at 17:58

score 4 · Answer 6 · answered Jul 08 '14 at 14:41

4

If we refine the problem talking about testing if a list contains another list with as a sequence, the answer could be the next one-liner:

def contains(subseq, inseq):
    return any(inseq[pos:pos + len(subseq)] == subseq for pos in range(0, len(inseq) - len(subseq) + 1))

Here unit tests I used to tune up this one-liner:

https://gist.github.com/anonymous/6910a85b4978daee137f

answered Jul 08 '14 at 14:41

Oleksiy

91
1
6

I think this is the answer for me! nice one-liner, looks right. – hwjp Dec 27 '16 at 12:06
2

incidentally though, I think `return True` might pass all those unit tests. – hwjp Dec 28 '16 at 17:37

HIMANSHU PANDEY · Answer 7 · 2021-11-22T09:46:19.267

I've Summarized and evaluated Time taken by different techniques

Used methods are:

def containsUsingStr(sequence, element:list):
    return str(element)[1:-1] in str(sequence)[1:-1]


def containsUsingIndexing(sequence, element:list):
    lS, lE = len(sequence), len(element)
    for i in range(lS - lE + 1):
        for j in range(lE):
            if sequence[i+j] != element[j]: break
        else: return True
    return False


def containsUsingSlicing(sequence, element:list):
    lS, lE = len(sequence), len(element)
    for i in range(lS - lE + 1):
        if sequence[i : i+lE] == element: return True
    return False


def containsUsingAny(sequence:list, element:list):
    lE = len(element)
    return any(element == sequence[i:i+lE] for i in range(len(sequence)-lE+1))

Code for Time analysis (averaging over 1000 iterations):

from time import perf_counter

functions = (containsUsingStr, containsUsingIndexing, containsUsingSlicing, containsUsingAny)
fCount = len(functions)


for func in functions:
    print(str.ljust(f'Function : {func.__name__}', 32), end='   ::   Return Values: ')
    print(func([1,2,3,4,5,5], [3,4,5,5]) , end=', ')
    print(func([1,2,3,4,5,5], [1,3,4,5]))



avg_times = [0]*fCount
for _ in range(1000):
    perf_times = []
    for func in functions:
        startTime = perf_counter()
        func([1,2,3,4,5,5], [3,4,5,5])
        timeTaken = perf_counter()-startTime
        perf_times.append(timeTaken)
        

    for t in range(fCount): avg_times[t] += perf_times[t]

minTime = min(avg_times)
print("\n\n Ratio of Time of Executions : ", ' : '.join(map(lambda x: str(round(x/minTime, 4)), avg_times)))

Output:

Conclusion: In this case, Slicing operation proves to be the fastest

score 3 · Answer 8 · edited Oct 03 '10 at 00:11

3

After OP's edit:

def contains(small, big):
    for i in xrange(1 + len(big) - len(small)):
        if small == big[i:i+len(small)]:
            return i, i + len(small) - 1
    return False

edited Oct 03 '10 at 00:11

Grant Paul

5,852
2
33
36

answered Oct 02 '10 at 20:26

eumiro

207,213
34
299
261

But it fails with contains([1,2], [-1, 0, 1, 1, 2]) which returns [2,4] instead of what I assume is the expected [3,4] – Andrew Dalke Oct 02 '10 at 20:40
2

This is going to be horribly inefficient for big lists, since it is constantly creating and destroying temporary lists every time it does `big[i:i+len(small)]` – Dave Kirby Oct 02 '10 at 21:22
@Dave: did you timeit this against your solution? – SilentGhost Oct 02 '10 at 21:41
1

According to my tests, this has slightly better performance than Dave Kirby's solution, even on large lists (1 million elements, with the matching subset at the end): 4.1s for 10 repetitions versus 5.6s for Dave's. I would love to post my test code, but there isn't an easy way to do that. – Tim Yates Oct 02 '10 at 22:20
@Tim Yates: Could you post your test code in a [gist](gist.github.com) or other pastebin-type deal? Maybe [codepad](codepad.org) would be a better option, though I don't know if running it on that site would be a) possible; b) reliable. It seems to be borken at the moment anyway. – intuited Oct 02 '10 at 22:34
1

**UPDATE**: I spoke too soon--my small lists were too small. This algorithm exploded once I increased their size to 1000, while the others stayed constant. It looks like Dave Kirby's wins for large lists after all. http://pastebin.com/NZwU6PUx – Tim Yates Oct 02 '10 at 22:46
This should return a tuple, not a list. A list is completely inappropriate for this usage. Corrected in the original post. – Grant Paul Oct 03 '10 at 00:11

score 1 · Answer 9 · edited Feb 04 '16 at 16:23

1

Smallest code:

def contains(a,b):
    str(a)[1:-1].find(str(b)[1:-1])>=0

edited Feb 04 '16 at 16:23

Fabio Lamanna

20,504
24
90
122

answered Feb 04 '16 at 15:31

Bart Mensfort

995
11
21

score 1 · Answer 10 · answered Oct 02 '10 at 23:44

Here's a straightforward algorithm that uses list methods:

#!/usr/bin/env python

def list_find(what, where):
    """Find `what` list in the `where` list.

    Return index in `where` where `what` starts
    or -1 if no such index.

    >>> f = list_find
    >>> f([2, 1], [-1, 0, 1, 2])
    -1
    >>> f([-1, 1, 2], [-1, 0, 1, 2])
    -1
    >>> f([0, 1, 2], [-1, 0, 1, 2])
    1
    >>> f([1,2], [-1, 0, 1, 2])
    2
    >>> f([1,3], [-1, 0, 1, 2])
    -1
    >>> f([1, 2], [[1, 2], 3])
    -1
    >>> f([[1, 2]], [[1, 2], 3])
    0
    """
    if not what: # empty list is always found
        return 0
    try:
        index = 0
        while True:
            index = where.index(what[0], index)
            if where[index:index+len(what)] == what:
                return index # found
            index += 1 # try next position
    except ValueError:
        return -1 # not found

def contains(what, where):
    """Return [start, end+1] if found else empty list."""
    i = list_find(what, where)
    return [i, i + len(what)] if i >= 0 else [] #NOTE: bool([]) == False

if __name__=="__main__":
    import doctest; doctest.testmod()

Akila D. Perera · Answer 11 · 2018-06-25T17:01:29.210

1

Here is my answer. This function will help you to find out whether B is a sub-list of A. Time complexity is O(n).

`def does_A_contain_B(A, B): #remember now A is the larger list
    b_size = len(B)
    for a_index in range(0, len(A)):
        if A[a_index : a_index+b_size]==B:
            return True
    else:
        return False`

edited Jun 25 '18 at 17:01

answered Jan 04 '17 at 05:29

Akila D. Perera

71
1
8

intuited · Answer 12 · 2010-10-02T22:22:53.040

I tried to make this as efficient as possible.

It uses a generator; those unfamiliar with these beasts are advised to check out their documentation and that of yield expressions.

Basically it creates a generator of values from the subsequence that can be reset by sending it a true value. If the generator is reset, it starts yielding again from the beginning of sub.

Then it just compares successive values of sequence with the generator yields, resetting the generator if they don't match.

When the generator runs out of values, i.e. reaches the end of sub without being reset, that means that we've found our match.

Since it works for any sequence, you can even use it on strings, in which case it behaves similarly to str.find, except that it returns False instead of -1.

As a further note: I think that the second value of the returned tuple should, in keeping with Python standards, normally be one higher. i.e. "string"[0:2] == "st". But the spec says otherwise, so that's how this works.

It depends on if this is meant to be a general-purpose routine or if it's implementing some specific goal; in the latter case it might be better to implement a general-purpose routine and then wrap it in a function which twiddles the return value to suit the spec.

def reiterator(sub):
    """Yield elements of a sequence, resetting if sent ``True``."""
    it = iter(sub)
    while True:
        if (yield it.next()):
            it = iter(sub)

def find_in_sequence(sub, sequence):
    """Find a subsequence in a sequence.

    >>> find_in_sequence([2, 1], [-1, 0, 1, 2])
    False
    >>> find_in_sequence([-1, 1, 2], [-1, 0, 1, 2])
    False
    >>> find_in_sequence([0, 1, 2], [-1, 0, 1, 2])
    (1, 3)
    >>> find_in_sequence("subsequence",
    ...                  "This sequence contains a subsequence.")
    (25, 35)
    >>> find_in_sequence("subsequence", "This one doesn't.")
    False

    """
    start = None
    sub_items = reiterator(sub)
    sub_item = sub_items.next()
    for index, item in enumerate(sequence):
        if item == sub_item:
            if start is None: start = index
        else:
            start = None
        try:
            sub_item = sub_items.send(start is None)
        except StopIteration:
            # If the subsequence is depleted, we win!
            return (start, index)
    return False

A valiant effort, but this has worse performance than either eumiro or Dave Kirby's solutions. 8.2s on the benchmark I described in the other comments. — Tim Yates, Oct 02 '10 at 22:21
Interesting to see the speed difference for native code. It seems like this algorithm would be relatively faster for longer subsequences.. how long was/were the subsequence(s) you used in the test? — intuited, Oct 02 '10 at 22:37
You're right. This performed much better than eumiro's solution with larger subsequences, but it still didn't perform as well as Dave's. — Tim Yates, Oct 02 '10 at 22:50

ChessMaster · Answer 13 · 2012-04-05T23:26:08.297

I think this one is fast...

def issublist(subList, myList, start=0):
    if not subList: return 0
    lenList, lensubList = len(myList), len(subList)
    try:
        while lenList - start >= lensubList:
            start = myList.index(subList[0], start)
            for i in xrange(lensubList):
                if myList[start+i] != subList[i]:
                    break
            else:
                return start, start + lensubList - 1
            start += 1
        return False
    except:
        return False

score 0 · Answer 14 · edited May 08 '20 at 03:07

0

a=[[1,2] , [3,4] , [0,5,4]]
print(a.__contains__([0,5,4]))

It provides true output.

a=[[1,2] , [3,4] , [0,5,4]]
print(a.__contains__([1,3]))

It provides false output.

edited May 08 '20 at 03:07

Tanjim Ahmed Khan

650
1
9
21

answered Apr 23 '20 at 18:26

Sihat Afnan

742
7
14

This seems version specific. For example, it fails on LeetCode. – Abhijit Sarkar Jul 27 '21 at 11:42

score 0 · Answer 15 · answered Nov 25 '20 at 23:36

Dave answer is good. But I suggest this implementation which is more efficient and doesn't use nested loops.

def contains(small_list, big_list):
    """
    Returns index of start of small_list in big_list if big_list
    contains small_list, otherwise -1.
    """
    loop = True
    i, curr_id_small= 0, 0
    while loop and i<len(big_list):
        if big_list[i]==small_list[curr_id_small]:
            if curr_id_small==len(small_list)-1:
                loop = False
            else:
                curr_id_small += 1
        else:
            curr_id_small = 0
        i=i+1
    if not loop:
        return i-len(small_list)
    else:
        return -1

Granitosaurus · Answer 16 · 2022-05-13T06:13:53.977

Here's a simple and efficient function to check whether big list contains a small one in matching order:

def contains(big, small):
    i = 0
    for value in big:
        if value == small[i]:
            i += 1
            if i == len(small):
                return True
        else:
            i = 1 if value == small[0] else 0
    return False

Usage:

"""
>>> contains([1,2,3,4,5], [2,3,4])
True
>>> contains([4,2,3,2,4], [2,3,4])
False
>>> contains([1,2,3,2,3,2,2,4,3], [2,4,3])
True
"""

score -1 · Answer 17 · answered Sep 03 '18 at 19:24

The problem of most of the answers, that they are good for unique items in list. If items are not unique and you still want to know whether there is an intersection, you should count items:

from collections import Counter as count

def listContains(l1, l2):
  list1 = count(l1)
  list2 = count(l2)

  return list1&list2 == list1

print( listContains([1,1,2,5], [1,2,3,5,1,2,1]) ) # Returns True
print( listContains([1,1,2,8], [1,2,3,5,1,2,1]) ) # Returns False

You can also return the intersection by using ''.join(list1&list2)

score -1 · Answer 18 · answered Sep 23 '20 at 09:19

Here a solution with less line of code and easily understandable (or at least I like to think so).

If you want to keep order (match only if the smaller list is found in the same order on the bigger list):

def is_ordered_subset(l1, l2):
    # First check to see if all element of l1 are in l2 (without checking order)
    if not set(l1).issubset(l2): 
        return False

    length = len(l1)
    # Make sublist of same size than l1
    list_of_sublist = [l2[i:i+length] for i, x in enumerate(l2)]
    #Check if one of this sublist is l1
    return l1 in list_of_sublist

score -1 · Answer 19 · answered May 07 '21 at 08:22

-1

You can use numpy:

def contains(l1, l2):
   """ returns True if l2 conatins l1 and False otherwise """

   if len(np.intersect1d(l1,l2))==len(l1):
      return = True
   else:
      return = False

answered May 07 '21 at 08:22

Atila Abdula

1

How to test if a list contains another list as a contiguous subsequence?

19 Answers19

Used methods are:

Code for Time analysis (averaging over 1000 iterations):

Output:

Conclusion: In this case, Slicing operation proves to be the fastest

Linked

Related