Subsequence match for a simple list and a list of sets?

Question

This is a problem involving subsequences. The ordinary subsequence problem has a list l1 and asks whether another list l2 is a subsequence. For example:

l1 = [3, 1, 4, 1, 5, 9]

l2 = [3, 1, 5]            => True (remove 4, 1, 9)

l2 = [4, 3, 5]            => False

In my problem, l2 instead has sets of values. For example:

l1 = [3, 1, 4, 1, 5, 9]
l2 = [{2, 3}, {1, 5, 2}, {9}]
=> True, because:
     {2, 3} matches 3
     {1, 5, 2} matches 1 (or 5)
     {9} matches 9

Here, l2 can be thought of conceptually expanded to different possibilities by taking one element from each set:

l2exp = [[2, 1, 9], [2, 5, 9], [2, 2, 9], [3, 1, 9], [3, 5, 9], [3, 2, 9]]

which means as long as one of those six possible lists represented by l2 matches l1, we have a successful match. Since [3, 1, 9] matches l1 , the whole l2 matches.

So one solution may be first flatten the l2 into the new l2exp like above, and then for each sublist_l2 in l2exp, the ordinary subsequence check all(e in iter_of_l1 for e in sublist_l2) can be used.

How to do this match without explicitly expanding the l2 into a list of lists?

in your example, `l1` matches `l2` because the elements `[1,3]` are in `[1,2,3,4]`? If that's the case, `[2,3]` would natch too, right? Is matching order dependent? What I'm saying here is, please clarify what you mean by match. — Puff, Dec 23 '21 at 18:14
What does it mean *"l2 can be conceptually expanded"*? Why does `[1, 3]` matches `l1`? This is just not clear — Tomerikoo, Dec 23 '21 at 18:16
The sequence from l2 is formed as: take each element from a set in l2 and put it into a list, so that you can get multiple lists. [1,3] matches [1,2,3,4] because it is a subsequence of l1. — marlon, Dec 23 '21 at 18:19
By sub-sequence you mean the numbers in the sublist appear ordered in the bigger list? Plase edit your question with the info clarifying what you mean by match. Maybe provide examples of what wouldn't match. — Puff, Dec 23 '21 at 18:21
Does this answer your question? [How to check if a list is a subsequence of another list in order (duplicate)](https://stackoverflow.com/q/63258894/6045800) — Tomerikoo, Dec 23 '21 at 18:24
@Tomerikoo Doing a subsequence check for every `product` can be quite inefficient (if the sets are large or you have many, and most/all subsequence checks fail). But see the OP's previous two questions, which are XY questions of this one. — Kelly Bundy, Dec 23 '21 at 18:42
@KellyBundy Questions should be self-contained. I am not supposed to check other questions to understand this one. It's just a shame that it's hard to explain because once I understood it it's quite interesting and your solution is especially neat — Tomerikoo, Dec 23 '21 at 18:44
@Tomerikoo Hmm, I think I would've understood this (even the original version) without having seen the previous questions. I mentioned those rather as an "in case you're curious". And yes, while the pure "subsequence" question is old and boring, this one is a nice twist on it that I hadn't seen. And I enjoy iterators and isdisjoint :-) — Kelly Bundy, Dec 23 '21 at 18:53
I rewrote the question so that in my opinion it's clearer. Hope that's ok. — Kelly Bundy, Dec 24 '21 at 09:10

Kelly Bundy · Answer 1 · 2021-12-24T09:41:24.337

Solutions

Instead of checking equality of the l1 and l2 elements, check for membership of the l1 elements in the l2 elements.

Some possible implementations (Try it online!):

def subseq(l1, l2):
    it1 = iter(l1)
    return all(any(x in pool for x in it1)
               for pool in l2)

def subseq(l1, l2):
    it1 = iter(l1)
    return all(not pool.isdisjoint(it1)
               for pool in l2)

def subseq(l1, l2):
    it1 = iter(l1)
    return not any(pool.isdisjoint(it1)
                   for pool in l2)

def subseq(l1, l2):
    return not any(map(set.isdisjoint, l2, repeat(iter(l1))))

def subseq(l1, l2):
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                break
        else:
            return
    return True

Benchmarks

Testing with

l1 = [0, 1, 2, 3, ..., 39]
l2 = [{0, 1}, {2, 3}, ..., {40, 41}]

Times:

5254.183 ms  subseq_product
   0.020 ms  subseq_all_any
   0.006 ms  subseq_all_not_disjoint
   0.005 ms  subseq_not_any_disjoint
   0.005 ms  subseq_map_disjoint
   0.003 ms  subseq_loops
1279.553 ms  subseq_Alain
   0.018 ms  subseq_Alain_faster

subseq_product does the idea from the question, going through all 2²¹ possibilities of l2, doing an ordinary subsequence check for each.
subseq_Alain tries 2²⁰ (?) recursive paths. The _faster version greedily takes each l2-element's first match and doesn't try further.

Trying the fast solutions with longer inputs:

l1 = [0, 1, 2, 3, ..., 999]
l2 = [{0, 1}, {2, 3}, ..., {1000, 1001}]

Times:

   0.285 ms  subseq_all_any
   0.058 ms  subseq_all_not_disjoint
   0.057 ms  subseq_not_any_disjoint
   0.031 ms  subseq_map_disjoint
   0.044 ms  subseq_loops
   1.488 ms  subseq_Alain_faster

Full benchmark code (Try it online!):

from timeit import timeit
from itertools import product, repeat

def subseq_product(l1, l2):

    def is_subseq(x, y):
        # Ordinary subsequence test,
        # see https://stackoverflow.com/a/24017747
        it = iter(y)
        return all(c in it for c in x)

    return any(is_subseq(p2, l1)
               for p2 in product(*l2))

def subseq_all_any(l1, l2):
    it1 = iter(l1)
    return all(any(x in pool for x in it1)
               for pool in l2)

def subseq_all_not_disjoint(l1, l2):
    it1 = iter(l1)
    return all(not pool.isdisjoint(it1)
               for pool in l2)

def subseq_not_any_disjoint(l1, l2):
    it1 = iter(l1)
    return not any(pool.isdisjoint(it1)
                   for pool in l2)

def subseq_map_disjoint(l1, l2):
    return not any(map(set.isdisjoint, l2, repeat(iter(l1))))

def subseq_loops(l1, l2):
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                break
        else:
            return
    return True

def subseq_Alain(A, S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0] and subseq_Alain(A[i+1:],S[1:]): # and matching rest
            return True            # found full match
    return False

def subseq_Alain_faster(A, S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0]:
            return subseq_Alain_faster(A[i+1:],S[1:]) # and matching rest
    return False

def benchmark(funcs, args, number):
    for _ in range(3):
        for func in funcs:
            t = timeit(lambda: func(*args), number=number) / number
            print('%8.3f ms ' % (t * 1e3), func.__name__)
        print()

l1 = list(range(40))
l2 = [{i, i+1} for i in range(0, 42, 2)]
funcs = [
    subseq_product,
    subseq_all_any,
    subseq_all_not_disjoint,
    subseq_not_any_disjoint,
    subseq_map_disjoint,
    subseq_loops,
    subseq_Alain,
    subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 1)

l1 = list(range(1000))
l2 = [{i, i+1} for i in range(0, 1002, 2)]
funcs = [
    subseq_all_any,
    subseq_all_not_disjoint,
    subseq_not_any_disjoint,
    subseq_map_disjoint,
    subseq_loops,
    subseq_Alain_faster,
]
benchmark(funcs, (l1, l2), 500)

This doesn't seem to be what is asked for... `any(x in pool for x in it1)` will match for example `[1, 5]` while `5` is not even in the list and you don't check for order... — Tomerikoo, Dec 23 '21 at 18:25
@Tomerikoo, you need to specify how you get the [1,5] from the list of sets. — marlon, Dec 23 '21 at 18:29
@marlon How is it working? With `l1 = [1, 2, 3, 4]` and `l2 = [{5, 1}, {7, 3}]` it returns True... How does that make sense? — Tomerikoo, Dec 23 '21 at 18:31
Because from your l2, you can get [1,3] which is sub sequence of l1. — marlon, Dec 23 '21 at 18:32
Interesting. I think I get it now. Nice solution. Gotta love iterators... — Tomerikoo, Dec 23 '21 at 18:37
Very nice use of iterators combined with the internal short circuiting of isdisjoint. +1 — Alain T., Dec 23 '21 at 19:04
@KellyBundy can you explain a little more: why isdisjoint() can take an iterator while it is supposed to be a set? — marlon, Dec 23 '21 at 19:36
@marlon Because it can :-). Just like `issubset` and a bunch of others. From [its documentation](https://docs.python.org/3/library/stdtypes.html#frozenset.isdisjoint), scroll down to the paragraph saying *"Note, the non-operator versions of union(), ..., and issuperset() methods will accept any iterable as an argument"*. Looks like they just forgot to include `isdisjoint` in that list. — Kelly Bundy, Dec 23 '21 at 19:56
@KellyBundy I just realized that my problem not only need to return bool flag, but also need to return the matched sequence, for example, [1,3] from the original list of sets. In that case, how to extract the matched list from the all(not pool.isdisjoint(it1)) version? — marlon, Dec 23 '21 at 20:02
@marlon For the `isdisjoint` solutions, I might try a generator as `it1`, capturing the last used value with an assignment expression. But at that point it lost its advantage, so I don't think I'd use `isdisjoint` for that, but rather [a variation of my first solution](https://tio.run/##nY8/C4MwEMX3fIqDLhGk4J@hFLp269ZNHNIaaSBcQoxUET@7vSgWu7V9S7j38n7c2d4/DGYH66apkjU07e0i/P3BdRKDTqMjA9JTeZRNAycoytlQPqFBeenoY7RYNQitucCed6AQrDEaBFaAxq@AvbBWYsW7pbJRbRzMNSJ/hCGYUZS99wly0rcOVzJjOmxU0NZpDFkMecl0GpyB5mSMYaAoG0tmnULPt3cSNoIdXF0rv8XQG8LfcDk1/8adhW7kNL0A) (using Alain's tests, see the Output section). — Kelly Bundy, Dec 23 '21 at 20:21
Yes. As long as it works, I will use the variation of the first solution. — marlon, Dec 23 '21 at 20:32
'not witness.append(x)': Does this mean if x not in witness, then append it to the list? Does append() return bool? — marlon, Dec 23 '21 at 20:45
@marlon `append` always returns `None`. So if the actual check `x in pool` succeeds, then my added `and not witness.append(x)` is just `and not None`, i.e., `and True`. Thus it doesn't influence the behavior, it's just a trick to sneak the appending of x into that code. I've written a few more versions, might post them as a separate "answer" later. — Kelly Bundy, Dec 23 '21 at 20:59

Alain T. · Answer 2 · 2021-12-23T19:15:38.123

1

If your list of sets can have more than two items, perhaps a recursive function would be best:

def subMatch(A,S):
    if not S: return True          # all sets matched
    for i,n in enumerate(A):       # check for membership in 1st set    
        if n in S[0] and subMatch(A[i+1:],S[1:]): # and matching rest
            return True            # found full match
    return False

Output:

l1 = [1, 2, 3, 4]

l2 = [{2, 1}, {1, 3}]

print(subMatch(l1,l2)) # True

l1 = [1, 2, 3, 4]

l2 = [{2, 1}, {1, 3}, {3, 4}]

print(subMatch(l1,l2)) # True

l1 = [1, 2, 4, 3]

l2 = [{2, 1}, {1, 3}, {3, 4}]

print(subMatch(l1,l2)) # False

edited Dec 23 '21 at 19:15

answered Dec 23 '21 at 18:45

Alain T.

40,517
4
31
51

Potentially very slow, for example `l1 = list(range(40))` and `l2 = [{i, i+1} for i in range(0, 42, 2)]` takes [more than a second](https://tio.run/##TY8xb4MwEIV3/4onZbFVD0AzVEgdsnTrRDfE4IQjWDEG2Waoqv52ajsq4gZLvnvv3XfLdxhn@/q2uG3raYBfr58q3EZ@kY2oGWLpAXYOaGo4Cquz@HIrYa8TlDHwFDym5KQ@u4bZQUsLbUF2ncipQPwi6t0VlbdHlk00Xcn5US9JXfqQ0th/flqf@k1bdFC2PyC2@qWsO9m08Y3JpzzOENreI21MmoeMxnCowxns8P9QxhNjpsQ7jPaBO2XvxM@FEMxUsdn@aIm48vd5XIJ6SgqJcyVRiY4tTtvAd0RTSlMJsW1/). – Kelly Bundy Dec 23 '21 at 19:03
Agreed, I much prefer yours although it took me a second to get why it works. – Alain T. Dec 23 '21 at 19:08
I guess it helped that I've known the short solution for the *ordinary* subsequence problem for years :-). You could speed yours up by moving the recursive call down, replacing `True`, but it seems trying multiple paths is the reason you've done this recursively in the first place? – Kelly Bundy Dec 23 '21 at 19:12
Path seeking was indeed what I had in mind but I realize that it is not very efficient for this kind of problem. – Alain T. Dec 23 '21 at 19:25
Did some benchmarks in my answer now, included yours and a greedy version of it. – Kelly Bundy Dec 23 '21 at 20:01

Kelly Bundy · Accepted Answer · 2021-12-24T08:23:25.040

Witness Versions

Instead of just knowing whether there's a match, one might want to also know what that match is. Maybe to truly use it, or maybe just to be able to verify the correctness. Here are a few versions, returning a list of the actual matched elements, or None if there's no match. The first two might be the best.

Perhaps the simplest, assume each pool from l2 does have a match in l1, and catch the exception:

def subseq1(l1, l2):
    it1 = iter(l1)
    try:
        return [next(x for x in it1 if x in pool)
                for pool in l2]
    except StopIteration:
        pass

Basic loops:

def subseq2(l1, l2):
    witness = []
    it1 = iter(l1)
    for pool in l2:
        for x in it1:
            if x in pool:
                witness.append(x)
                break
        else:
            return
    return witness

Slightly modifying my original if(all(any(... solution to append each witnessing element:

def subseq3(l1, l2):
    witness = []
    it1 = iter(l1)
    if all(any(x in pool and not witness.append(x)
               for x in it1)
           for pool in l2):
        return witness

Pre-append a spot for the next witness element:

def subseq4(l1, l2):
    witness = []
    it1 = iter(l1)
    if all(any(witness[-1] in pool
               for witness[-1] in it1)
           for pool in l2
           if not witness.append(None)):
        return witness

Pre-allocate spots for all witness elements:

def subseq5(l1, l2):
    witness = [None] * len(l2)
    it1 = iter(l1)
    if all(any(witness[i] in pool
               for witness[i] in it1)
           for i, pool in enumerate(l2)):
        return witness

Check for false witnesses at the end:

def subseq6(l1, l2):
    it1 = iter(l1)
    false = object()
    witness = [next((x for x in it1 if x in pool), false)
               for pool in l2]
    if false not in witness:
        return witness

Testing code (cases taken from Alain's answer):

funcs = [
    subseq1,
    subseq2,
    subseq3,
    subseq4,
    subseq5,
    subseq6,
]

for func in funcs:
    l1 = [1, 2, 3, 4]
    l2 = [{2, 1}, {1, 3}]
    print(func(l1,l2)) # True

    l1 = [1, 2, 3, 4]
    l2 = [{2, 1}, {1, 3}, {3, 4}]
    print(func(l1,l2)) # True

    l1 = [1, 2, 4, 3]
    l2 = [{2, 1}, {1, 3}, {3, 4}]
    print(func(l1,l2)) # False

    print()

Subsequence match for a simple list and a list of sets?

3 Answers3

Solutions

Benchmarks

Witness Versions

Linked