Find a specific sequence of values in one list within a larger list

Question

I am trying to determine if List1=[0,0,0] is contained within List2 or List3 in the most efficient manner that I can and where:

List2=[34, 32, 25, 0, 0, 0, 32] with results True

List3=[34, 32, 25, 0, 32, 0, 0] with results False

I have tried set().subset but it returns True and True, I tried if List1 in List2 and get False

I know I can iterate through the list and do value,sequence comparisons but was wondering if there was already a function that did this kind of comparison and if not could this be done with a fairly simple lambda expression?

Note: List2 and List3 can be a lot longer those are just short examples showing the difference and more precisely what it is I am looking for

If you know your lists are small, you could do something like [this](https://stackoverflow.com/a/41577084/3282436) and then check if `List1 in allSubArrays(List2)`. But a generic solution is probably better. — 0x5453, Jul 19 '19 at 17:15
Possible duplicate of [Checking if list is a sublist](https://stackoverflow.com/questions/35964155/checking-if-list-is-a-sublist) — Error - Syntactical Remorse, Jul 20 '19 at 02:11
The answer selected within the **Checking if list is a sublist** question would indicated that the type of match is not the same kind of match I am looking for and further more the answer is not at all efficient even if it was the same kind of search -- so it potentially fails on 2 criterion levels and at least on 1 -- so no not a duplicate of that question — Dennis Jensen, Jul 22 '19 at 16:47

brentertainer · Answer 1 · 2019-07-19T17:28:53.400

2

I am not aware of a function designed for this. You can, however, achieve this with a lambda.

is_contiguous_subsequence = lambda small, big: any(small == big[i:i+len(small)] for i in range(len(big) - len(small) + 1))

This is a few too many characters to have in a lambda function for my taste, so I would suggest just making it a regular function.

def is_contiguous_subsequence(small, big):
    return any(small == big[i:i+len(small)] for i in range(len(big) - len(small) + 1))

By the nature of any, this will return True upon the first found match and not continue through the rest of the big list. Just a little efficiency bonus.

edited Jul 19 '19 at 17:28

answered Jul 19 '19 at 17:21

brentertainer

2,118
1
6
15

This is probably the best option for small searches. If you're really looking for efficiency you can beat this with a more complicated function. Essentially, check every `len(small)` character and see if it is in the `set(small)`: if not, then the substring cannot be in the skipped portion, dividing the search length by up to `len(small)`. You can also check only the first character of the sequence against every potentially valid `i` in `big`, although python might be doing this already under the hood for these comparisons, so that may not offer the speedup it would in other languages. – TemporalWolf Jul 19 '19 at 17:53
Thanks after viewing the lambda and inline function I decided making it into its own function call to be the best since it can include a break but used the above basic logic as follows : ```def IsSubSet(self, ShrtList, LongList): RetVal = False for idx in range ( len(LongList) - len(ShrtList) ): if LongList[idx:idx+len(ShrtList)] == ShrtList: RetVal = True break return RetVal``` – Dennis Jensen Jul 19 '19 at 18:03
@DennisJensen You do need the +1 tacked on to the difference of array lengths inside `range`. Otherwise, a case like `ShrtList = [0, 0, 0]`, `LongList = [1, 0, 0, 0]` will fail because the very last sequence of 3 items in `LongList` will not be checked. Also, the very last point I meant to convey in the answer is that `any` will stop on the first found match; it's like the break in your function, just built in. – brentertainer Jul 20 '19 at 02:43
While that is true in my instance it will not be as the shortest my list should be is 5 but thanks for the heads up on that – Dennis Jensen Jul 22 '19 at 15:17

Jon Betts · Answer 2 · 2019-07-19T18:50:09.517

As pointed in some of the comments, there is a tension between readable and efficient. This solution provides the index of the smaller list in the larger, and can be used for your problem by checking if the index is not None.

The following algorithm is about 2x faster than a more direct solution for small parent lists (length 6), but can be about 15x faster for longer lists (length 10,000).

The trick is to use the built in list.index() function on the each item to quickly skip through the parent list. If we notice any gaps bigger than 1 between indexes we know the sequence is broken, but we can start off near this point wherever it is.

def index_of(parent_list, sub_list):
    # No match possible
    if len(sub_list) > len(parent_list):
        return

    # Empty list 'matches' at index 0
    if not sub_list:
        return 0

    sequence_start = 0
    while True:
        try:
            match_found, offset = _sub_match(
                parent_list, sub_list, sequence_start)
        except ValueError:
            return

        if match_found:
            return sequence_start

        sequence_start = offset


def _sub_match(parent_list, sub_list, start_at):
    pos, last_offset = 0, start_at - 1
    # Skip through the items looking for the next index after the one before

    for item in sub_list:
        offset = parent_list.index(item, last_offset + 1)

        # We jumped more than one value, so the sequence is broken
        if offset - last_offset != 1:
            return False, offset - pos

        pos += 1
        last_offset = offset

    return True, last_offset

This could be an improvement, although it's worth the caution that some data will degenerate to the base case: `[0,0,0,1]` in `[0,0,0,0,0,0,0,0,0,0,0,0]` will give us essentially the same run as the simple brute force method. — TemporalWolf, Jul 19 '19 at 18:08
That's a fair point. The worst case performance is still bad, but with random-ish numbers, the average case is much better. The less frequently the first item key appears, the faster it is. I wonder if you could pull the same trick of indexing with each part? This means we would skip straight to the end? — Jon Betts, Jul 19 '19 at 18:13
In the case above, (the kind you might expect from a binary sensor), knowing your data and writing a specialist algorithm for that data would be your best bet. If you know the data will be mostly 0, then searching for other parts of the string makes more sense. If you have a bunch of substrings for comparison you could even pre-process the larger string to extract that sort of information: the parent contains mostly 0s (don't use those for searching), no 2s (any substring with 2s can be dismissed), etc. — TemporalWolf, Jul 19 '19 at 18:30
@TemporalWolf Probably true. Thanks for pointing out the worst case scenario. I've updated the method with a version which doesn't suffer from this. I'm assuming this is a one time check, and pre-processing isn't a good trade. I'm also assuming the algo is for general use (to match the title of the question) — Jon Betts, Jul 19 '19 at 18:44

Dennis Jensen · Accepted Answer · 2019-07-22T17:02:01.180

1

First thanks go out to both @brentertainer and @jon-betts for their insights. Now to reiterate all I needed to know is if the SubList is contained within the FullList still as such I saw the efficiency enhancement in what @jon-betts posted but implemented it as follows instead:

class ClassContainer:
    # This handles everything pertinent to this Class
    def __init__(self):
        self.ClassName = 'ThisClass'

    @staticmethod
    def IsSubSet(SubList, FullList):
        RetVal = False
        Item = SubList[0]
        Range = len(FullList) - len(SubList)
        LenAdjtr = len(SubList)

        for idx in range ( Range ):
            idx = FullList.index(Item, idx)
            if idx > Range:
                break
            if FullList[idx:(idx + LenAdjtr)] == SubList:
                RetVal = True
                break

        return RetVal

This definitely stream-lines the function greatly for longer sequences which will come in handy for this application but does not concern itself with nit-picking through the SubList just does a straight compare of its full value which seemed to be even more efficient.

edited Jul 22 '19 at 17:02

answered Jul 22 '19 at 16:38

Dennis Jensen

214
1
14

Based on the arguments this function accepts (specifically `self`), it appears you're implementing this function as an instance method. If that is the case, it would make sense to remove `self` as an argument and decorate with `@staticmethod` since the instance is not used in the function body. Otherwise, just remove `self` as an argument. – brentertainer Jul 22 '19 at 16:48
Okay @brentertainer I put the class wrapper around it so that you can see why it was declared with the "self" aspect -- it was not a stand-alone method but a method within a class -- without the "self" python complains about it being in the class -- perhaps I need a different value than "self" but the error message says to add "self" and that stops it complaining – Dennis Jensen Jul 22 '19 at 16:57
Nvm I dug into what you stated and found out what you meant -- so I have added that in -- thanks for the heads up on that – Dennis Jensen Jul 22 '19 at 17:02
I probably could have said it better. But yes, your last edit addresses what I was trying to say. Thanks! – brentertainer Jul 22 '19 at 17:08
No thank you - as I learned yet another aspect of Python I had not known about but in the back of my mind wondered about but had not put time towards pursuing the answer for it – Dennis Jensen Jul 22 '19 at 17:29

dildeolupbiten · Answer 4 · 2019-07-19T18:04:36.677

0

Here is the functional programming way:

List1 = [0, 0, 0]

List2 = [34, 32, 25, 0, 0, 0]

f = lambda *args: True if \
    list(filter(lambda i: i in args[1] and \
    args[0].count(i) <= args[1].count(i), args[0])) == args[0] else False

print(f(List1, List2))
# True

edited Jul 19 '19 at 18:04

answered Jul 19 '19 at 17:41

dildeolupbiten

1,314
1
15
27

score 0 · Answer 5 · answered Jul 19 '19 at 18:13

I like this way, defining a method which returns a generator:

def each_cons(ary, n = 2):
  if n < 2: n = 1
  i, size = 0, len(ary)
  while i < size-n+1:
    yield ary[i:i+n]
    i += 1

Then you can use like this:

list2= = [34, 32, 25, 0,  0, 0, 32]
chk = [0,0,0]

chk in each_cons(list2, len(chk))
#=> True

Self-explanation:

res = each_cons(list2, 3)
print(list(res))
#=> [[34, 32, 25], [32, 25, 0], [25, 0, 0], [0, 0, 0], [0, 0, 32]]

Find a specific sequence of values in one list within a larger list

5 Answers5