Find consecutive and nonconsecutive ordered sequences of items in a list

Question

I have two lists:

lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]

I want to count how many times the lookup_list appeared in my_list with the following logic:

The order should be 1 -> 2 -> 3
In my_list, the lookup_list items doesn't have to be next to each other: 1,4,2,1,5,3 -> should generate a match since there is a 2 comes after a 1 and a 3 comes after 2.

The mathces based on the logic:

1st match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]

2nd match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]

3rd match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]

4th match: [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]

The lookup_list is dynamic, it could be defined as [1,2] or [1,2,3,4], etc. How can I solve it? All the answers I've found is about finding matches where 1,2,3 appears next to each other in an ordered way like this one: Find matching sequence of items in a list

I can find the count of consecutive sequences with the below code but it doesn't count the nonconsecutive sequences:

from nltk import ngrams
lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
all_counts = Counter(ngrams(l2, len(l1)))
counts = {k: all_counts[k] for k in [tuple(lookup_list)]}
counts
>>> {(1, 2, 3): 2}

I tried using pandas rolling window functions but they don't have a custom reset option.

Welcome to Stack Overflow. This is a [questions and answers site](https://stackoverflow.com/about), not a code-writing service. Please read through [How to Ask](https://stackoverflow.com/help/how-to-ask) and [edit] your question to reflect your work. — ljmc, Jan 02 '23 at 14:26
You need to specify what you count as a matching subsequence. There are numerous examples of the subsequence consisting of `1,2,3` that you are *not* counting in your example. You seem to have some sort of unstated disjointness criterion. — John Coleman, Jan 02 '23 at 14:51
@JohnColeman, the sequence should be ordered: 1 then 2 then 3. However, they don't have to appear one after another, meaning that there can be other items between, for example, 5 can appear between 1 and 2. When there is a match, the matched items should not be counted again. — mrgn, Jan 02 '23 at 15:32
If matched items aren't counted again, then how do you determine which of potentially several matches a given item should appear in? — John Coleman, Jan 02 '23 at 15:39
It's like a rolling/sliding window implementation but I don't have a specific boundary to limit the window. Imagine you are adding items to your cart, you added 5 items, then bought only one of the items. The rest 4 items remained in your cart. You added another 3 items to your cart, and bought 2 of the 7 items in your cart. I want to know which items(or how many items) went through the "add to cart" -> "check out" -> "payment" process. — mrgn, Jan 02 '23 at 16:02

Olvin Roght · Accepted Answer · 2023-01-02T16:18:14.713

def find_all_sequences(source, sequence):

    def find_sequence(source, sequence, index, used):
        for i in sequence:
            while True:
                index = source.index(i, index + 1)
                if index not in used:
                    break
            yield index
    
    first, *rest = sequence
    index = -1
    used = set()
    while True:
        try:
            index = source.index(first, index + 1)
            indexes = index, *find_sequence(source, rest, index, used)
        except ValueError:
            break
        else:
            used.update(indexes)
            yield indexes

Usage:

lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]
print(*find_all_sequences(my_list, lookup_list), sep="\n")

Output:

(0, 1, 2)
(6, 7, 11)
(9, 10, 15)
(14, 16, 17)

Generator function find_all_sequences() yields tuples with indexes of sequence matches. In this function we initialize loop which will be stopped when list.index() call will throw ValueError. Internal generator function find_sequence() yields index of every sequence item.

According to this benchmark, my method is about 60% faster than one from Andrej Kesely's answer.

Sorry misunderstood the output. In your function, the 3rd match should be `((9, 1), (10, 2), (15, 3))` since the index `11` is already counted in the previous match. — mrgn, Jan 02 '23 at 15:50
@mrgn, sorry I've missed that requirement. Edited my answer with fix — Olvin Roght, Jan 02 '23 at 16:12

score 1 · Answer 2 · answered Jan 02 '23 at 14:47

The function find_matches() returns indices where the matches from lookup_list are:

def find_matches(lookup_list, lst):
    buckets = []

    def _find_bucket(i, v):
        for b in buckets:
            if lst[b[-1]] == lookup_list[len(b) - 1] and v == lookup_list[len(b)]:
                b.append(i)

                if len(b) == len(lookup_list):
                    buckets.remove(b)
                    return b
                break
        else:
            if v == lookup_list[0]:
                buckets.append([i])

    rv = []
    for i, v in enumerate(my_list):
        b = _find_bucket(i, v)
        if b:
            rv.append(b)
    return rv


lookup_list = [1, 2, 3]
my_list = [1, 2, 3, 4, 5, 2, 1, 2, 2, 1, 2, 3, 4, 5, 1, 3, 2, 3, 1]


print(find_matches(lookup_list, my_list))

Prints:

[[0, 1, 2], [6, 7, 11], [9, 10, 15], [14, 16, 17]]

score 0 · Answer 3 · answered Jan 02 '23 at 15:19

Here is a recursive solution:

lookup_list = [1,2,3]
my_list = [1,2,3,4,5,2,1,2,2,1,2,3,4,5,1,3,2,3,1]


def find(my_list, continue_from_index):
    if continue_from_index > (len(my_list) - 1):
        return 0
    
    last_found_index = 0
    found_indizes = []
    first_occuring_index = 0
    found = False
    for l in lookup_list:
        for m_index in range(continue_from_index, len(my_list)):
            if my_list[m_index] is l and m_index >= last_found_index:
                if not found:
                    found = True
                    first_occuring_index = m_index
                
                last_found_index = m_index
                found += 1
                found_indizes.append(str(m_index))
                break
        
    if len(found_indizes) is len(lookup_list):
        return find(my_list, first_occuring_index+1) + 1
    
    return 0
            
print(find(my_list, 0))

I highly recommend you to use `==` for comparison integers. Your code works will work with small integers because of interpreter optimizations, but it's extremely unstable. Read: [Is there a difference between "==" and "is"?](https://stackoverflow.com/q/132988/10824407). — Olvin Roght, Jan 02 '23 at 15:25

Shariff Mohammad · Answer 4 · 2023-01-02T15:50:57.620

-1

my_list = [5, 6, 3, 8, 2, 1, 7, 1]
lookup_list = [8, 2, 7]
counter =0
result =False
for i in my_list:
    if i in lookup_list:
        counter+=1

if(counter==len(lookup_list)):
     result=True

print (result)

edited Jan 02 '23 at 15:50

answered Jan 02 '23 at 15:37

Shariff Mohammad

1
1

Find consecutive and nonconsecutive ordered sequences of items in a list

4 Answers4