42

I want to write a function that determines if a sublist exists in a larger list.

list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]

#Should return true
sublistExists(list1, [1,1,1])

#Should return false
sublistExists(list2, [1,1,1])

Is there a Python function that can do this?

SaeX
  • 17,240
  • 16
  • 77
  • 97
Jonathan
  • 3,464
  • 9
  • 46
  • 54
  • Will your lists always contain only 0 or 1? – Mark Byers Jul 22 '10 at 21:38
  • Is this for Python 2.x or 3.x? – Mark Byers Jul 22 '10 at 21:51
  • 3
    Ah - I see the gotcha here. You are not looking for something being a subset of the other set - but that it must match in order a slice of the list. – Danny Staple Nov 26 '13 at 14:50
  • 1
    See also answer using KMP (Knuth-Morris-Pratt) algorithm: [python - Best way to determine if a sequence is in another sequence? - Stack Overflow](https://stackoverflow.com/questions/425604/best-way-to-determine-if-a-sequence-is-in-another-sequence) – user202729 Dec 05 '21 at 18:08

11 Answers11

47

Let's get a bit functional, shall we? :)

def contains_sublist(lst, sublst):
    n = len(sublst)
    return any((sublst == lst[i:i+n]) for i in range(len(lst)-n+1))

Note that any() will stop on first match of sublst within lst - or fail if there is no match, after O(m*n) ops

Błażej Michalik
  • 4,474
  • 40
  • 55
Nas Banov
  • 28,347
  • 6
  • 48
  • 67
22

If you are sure that your inputs will only contain the single digits 0 and 1 then you can convert to strings:

def sublistExists(list1, list2):
    return ''.join(map(str, list2)) in ''.join(map(str, list1))

This creates two strings so it is not the most efficient solution but since it takes advantage of the optimized string searching algorithm in Python it's probably good enough for most purposes.

If efficiency is very important you can look at the Boyer-Moore string searching algorithm, adapted to work on lists.

A naive search has O(n*m) worst case but can be suitable if you cannot use the converting to string trick and you don't need to worry about performance.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 4
    `--` : the code is seriously broken, try `sublistExists([10], [1,0])` == True?! – Nas Banov Jul 23 '10 at 01:46
  • 13
    @Nas Banov: That's why Mark wrote in his first sentence "If you are sure that your inputs will only contain single characters '0' and '1'..." – Tim Pietzcker Jul 23 '10 at 06:03
  • 1
    @Tim: But the inputs don't contain "single characters '0' and '1'", mind you! The example shown contains only the numbers `0` and `1` (or "digits" if you will). :) Besides, his code is slightly more broad - it will handle correct any list of 1-chars or any list of 1-digit numbers (but not both). And it's fairly easy to fix by introducing separator to `str.join` – Nas Banov Jul 23 '10 at 07:33
  • I agree with you about Boyer-Moore. I've posted an answer with an implementation. –  Mar 02 '17 at 23:14
  • @Nas Banov Just to expand on/reiterate your comment, if you replace "" with , it works. So you do need to determine a separator based on the data, but you don't necessarily have to restrict inputs to single characters. – Chris Coffee Nov 05 '22 at 02:21
4

No function that I know of

def sublistExists(list, sublist):
    for i in range(len(list)-len(sublist)+1):
        if sublist == list[i:i+len(sublist)]:
            return True #return position (i) if you wish
    return False #or -1

As Mark noted, this is not the most efficient search (it's O(n*m)). This problem can be approached in much the same way as string searching.

sas4740
  • 4,510
  • 8
  • 26
  • 23
4

My favourite simple solution is following (however, its brutal-force, so i dont recommend it on huge data):

>>> l1 = ['z','a','b','c']
>>> l2 = ['a','b']
>>>any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True

This code above actually creates all possible slices of l1 with length of l2, and sequentially compares them with l2.

Detailed explanation

Read this explanation only if you dont understand how it works (and you want to know it), otherwise there is no need to read it

Firstly, this is how you can iterate over indexes of l1 items:

>>> [i for i in range(len(l1))]
[0, 1, 2, 3]

So, because i is representing index of item in l1, you can use it to show that actuall item, instead of index number:

>>> [l1[i] for i in range(len(l1))]
['z', 'a', 'b', 'c']

Then create slices (something like subselection of items from list) from l1 with length of2:

>>> [l1[i:i+len(l2)] for i in range(len(l1))]
[['z', 'a'], ['a', 'b'], ['b', 'c'], ['c']] #last one is shorter, because there is no next item.

Now you can compare each slice with l2 and you see that second one matched:

>>> [l1[i:i+len(l2)] == l2 for i in range(len(l1))]
[False, True, False, False] #notice that the second one is that matching one

Finally, with function named any, you can check if at least one of booleans is True:

>>> any(l1[i:i+len(l2)] == l2 for i in range(len(l1)))
True
Jan Musil
  • 508
  • 5
  • 15
3

The efficient way to do this is to use the Boyer-Moore algorithm, as Mark Byers suggests. I have done it already here: Boyer-Moore search of a list for a sub-list in Python, but will paste the code here. It's based on the Wikipedia article.

The search() function returns the index of the sub-list being searched for, or -1 on failure.

def search(haystack, needle):
    """
    Search list `haystack` for sublist `needle`.
    """
    if len(needle) == 0:
        return 0
    char_table = make_char_table(needle)
    offset_table = make_offset_table(needle)
    i = len(needle) - 1
    while i < len(haystack):
        j = len(needle) - 1
        while needle[j] == haystack[i]:
            if j == 0:
                return i
            i -= 1
            j -= 1
        i += max(offset_table[len(needle) - 1 - j], char_table.get(haystack[i]));
    return -1

    
def make_char_table(needle):
    """
    Makes the jump table based on the mismatched character information.
    """
    table = {}
    for i in range(len(needle) - 1):
        table[needle[i]] = len(needle) - 1 - i
    return table
    
def make_offset_table(needle):
    """
    Makes the jump table based on the scan offset in which mismatch occurs.
    """
    table = []
    last_prefix_position = len(needle)
    for i in reversed(range(len(needle))):
        if is_prefix(needle, i + 1):
            last_prefix_position = i + 1
        table.append(last_prefix_position - i + len(needle) - 1)
    for i in range(len(needle) - 1):
        slen = suffix_length(needle, i)
        table[slen] = len(needle) - 1 - i + slen
    return table
    
def is_prefix(needle, p):
    """
    Is needle[p:end] a prefix of needle?
    """
    j = 0
    for i in range(p, len(needle)):
        if needle[i] != needle[j]:
            return 0
        j += 1    
    return 1
    
def suffix_length(needle, p):
    """
    Returns the maximum length of the substring ending at p that is a suffix.
    """
    length = 0;
    j = len(needle) - 1
    for i in reversed(range(p + 1)):
        if needle[i] == needle[j]:
            length += 1
        else:
            break
        j -= 1
    return length

Here is the example from the question:

def main():
    list1 = [1,0,1,1,1,0,0]
    list2 = [1,0,1,0,1,0,1]
    index = search(list1, [1, 1, 1])
    print(index)
    index = search(list2, [1, 1, 1])
    print(index)

if __name__ == '__main__':
    main()

Output:

2
-1
KyleMit
  • 30,350
  • 66
  • 462
  • 664
1

Here is a way that will work for simple lists that is slightly less fragile than Mark's

def sublistExists(haystack, needle):
    def munge(s):
        return ", "+format(str(s)[1:-1])+","
    return munge(needle) in munge(haystack)
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
1
def sublistExists(x, y):
  occ = [i for i, a in enumerate(x) if a == y[0]]
  for b in occ:
      if x[b:b+len(y)] == y:
           print 'YES-- SUBLIST at : ', b
           return True
      if len(occ)-1 ==  occ.index(b):
           print 'NO SUBLIST'
           return False

list1 = [1,0,1,1,1,0,0]
list2 = [1,0,1,0,1,0,1]

#should return True
sublistExists(list1, [1,1,1])

#Should return False
sublistExists(list2, [1,1,1])
SuperNova
  • 25,512
  • 7
  • 93
  • 64
0

Might as well throw in a recursive version of @NasBanov's solution

def foo(sub, lst):
    '''Checks if sub is in lst.

    Expects both arguments to be lists
    '''
    if len(lst) < len(sub):
        return False
    return sub == lst[:len(sub)] or foo(sub, lst[1:])
wwii
  • 23,232
  • 7
  • 37
  • 77
  • Recursion... Can cause a stack overflow on long lists – Tigran Saluev Nov 29 '16 at 14:55
  • @TigranSaluev - stack overflow or maximum recursion depth or RecursionError? – wwii Nov 29 '16 at 18:37
  • 1
    RuntimeError: maximum recursion depth exceeded in cmp – Tigran Saluev Nov 30 '16 at 09:57
  • "Might as well"--hmm, why, exactly? This recursive approach seems to have no redeeming qualities compared to the iterative version. It seems longer, less efficient, more error-prone, and less understandable. (I have nothing against recursion in general.) – Joshua P. Swanson Mar 06 '17 at 00:50
  • @wwii: Alrighty :) I was wondering if you had a particular reason to do it recursively, but it seems it was just because it could be done. Given the recursion depth issue in particular, it does seem like a bad solution. – Joshua P. Swanson Mar 07 '17 at 05:37
0
def sublist(l1,l2):
  if len(l1) < len(l2):
    for i in range(0, len(l1)):
      for j in range(0, len(l2)):
        if l1[i]==l2[j] and j==i+1:
        pass
      return True
  else:
    return False
-2

I know this might not be quite relevant to the original question but it might be very elegant 1 line solution to someone else if the sequence of items in both lists doesn't matter. The result below will show True if List1 elements are in List2 (regardless of order). If the order matters then don't use this solution.

List1 = [10, 20, 30]
List2 = [10, 20, 30, 40]
result = set(List1).intersection(set(List2)) == set(List1)
print(result)

Output

True
Chadee Fouad
  • 2,630
  • 2
  • 23
  • 29
-4

if iam understanding this correctly, you have a larger list, like :

list_A= ['john', 'jeff', 'dave', 'shane', 'tim']

then there are other lists

list_B= ['sean', 'bill', 'james']

list_C= ['cole', 'wayne', 'jake', 'moose']

and then i append the lists B and C to list A

list_A.append(list_B)

list_A.append(list_C)

so when i print list_A

print (list_A)

i get the following output

['john', 'jeff', 'dave', 'shane', 'tim', ['sean', 'bill', 'james'], ['cole', 'wayne', 'jake', 'moose']]

now that i want to check if the sublist exists:

for value in list_A:
    value= type(value)
    value= str(value).strip('<>').split()[1]
    if (value == "'list'"):
        print "True"
    else:
        print "False"

this will give you 'True' if you have any sublist inside the larger list.

Suhail
  • 2,847
  • 2
  • 19
  • 16