Structural pattern matching python - match at any position in sequence

Question

I have a list of objects, and want to check if part of the list matches a specific pattern.

Consider the following lists:

l1 = ["foo", "bar"]
l2 = [{1, 2},"foo", "bar"]
l3 = ["foo", "bar", 5]
l4 = [{1,2},"foo", "bar", 5, 6]

How would I match the sequence ["foo", "bar"] in all the different cases?

My naive idea is:

match l4:
    case [*_, "foo", "bar", *_]:
        print("matched!")

Unfortunately this is a SyntaxError: multiple starred names in sequence pattern. The issue is, that I don't know how many elements are leading and trailing the pattern.

Edit: I think I need to clarify: "foo", "bar" is just a stand-in for a much more complex pattern. (I am working with an AST object)

is it only a valid list if `foo` and `bar` are back to back? — Rabinzel, Sep 03 '22 at 17:10
could you add a full working example? here a link to the [PEP](https://docs.python.org/3/whatsnew/3.10.html#pep-634-structural-pattern-matching) — cards, Sep 03 '22 at 17:26
@cards How shall they show a working example if they don't know how to make it work? — Kelly Bundy, Sep 03 '22 at 17:30
@Kelly Bundy I read `14` instead of `l4` and I thought the code was not complete:) — cards, Sep 03 '22 at 17:34

Ignatius Reilly · Accepted Answer · 2022-09-03T19:06:32.517

1

def struct_match(lst_target, lst_pattern):
    for i in range(len(lst_target)-(len(lst_pattern)-1)):
        if lst_target[i:i+len(lst_pattern)] == lst_pattern:
            print('matched!')
            break

l1 = ["foo", "bar"]
l2 = [{1, 2},"foo", "bar"]
l3 = ["foo", "bar", 5]
l4 = [{1,2},"foo", "bar", 5, 6]
l5 = [{1,2},"foo", "baz", "bar", 5, 6]

patt = ["foo", "bar"]

struct_match(l1, patt)
struct_match(l2, patt)
struct_match(l3, patt)
struct_match(l4, patt)
struct_match(l5, patt)


# outputs

matched!
matched!
matched!
matched!

PS: I just found a beautiful recursive solution here (recursion is always beautiful... if your list is not too long)

edited Sep 03 '22 at 19:06

answered Sep 03 '22 at 17:23

Ignatius Reilly

1,594
2
6
15

You should add a `break` or `return` in there as this might print `'matched!'` multiple times if the sequence exists multiple times in the list. – Chris Sep 03 '22 at 18:59
@Chris, I edited it to add your correction. Thank you!! – Ignatius Reilly Sep 03 '22 at 19:07

score 1 · Answer 2 · edited Sep 03 '22 at 22:52

1

This cannot be done directly with pattern-matching, but pairing pattern matching with recursion works, though this is not really a good use case for structural pattern matching. We pattern match on the empty list or list of length one as an exit condition to break the recursion.

On each recursive call we slice the list to remove one element from the front.

We are assuming seq is of length two. Without that this use of pattern matching becomes quite difficult.

def has_seq(lst, seq):
  match lst:
    case [] | [_]: return False
    case [x, y, *_]: 
      if [x, y] == seq:
        return True
      else:
        return has_seq(lst[1:], seq)

It's important to remember that names in case patterns do not check to see if an element is equal to an existing variable. Rather they bind a name to that element. If this has the same name as an existing variable, it shadows that variable.

We can put a guard on a pattern to clean up the above.

def has_seq(lst, seq):
  match lst:
    case [] | [_]: return False
    case [x, y, *_] if [x, y] == seq: return True
    case _: return has_seq(lst[1:], seq)

A match can be used to match against literals. If we wanted to check if the first two elements of a list were 1 and 2:

match lst:
  case [1, 2, *_]: ...

But considering has_seq does not hardcode the values to look for, this cannot be utilized.

Of course, I think this looks nicer using a generator expression and any to see if any subsequence of lst of the same length as seq is equal to seq. This has the benefit of handling sequences of any length.

def has_seq(lst, seq):
  return any(lst[i:i+len(seq)] == seq 
             for i in range(len(lst) - len(seq) + 1))

Or, using in, just:

def has_seq(lst, seq):
  return seq in (lst[i:i+len(seq)]
                 for i in range(len(lst) - len(seq) + 1)

edited Sep 03 '22 at 22:52

Kelly Bundy

23,480
7
29
65

answered Sep 03 '22 at 18:35

Chris

26,361
5
21
42

bit lost... where are `x` and `y` defined? – cards Sep 03 '22 at 18:45
1

`case[x, y, *_]` is _not_ determining if the first element _is_ equal to the value of `x`. It's binding the name `x` to the first element. Likewise for `y`. – Chris Sep 03 '22 at 18:47
I think my comment about `case _` was mistaken, I thought of the guard ifs like ifs nested inside the case. – Kelly Bundy Sep 03 '22 at 19:39
Anyway, try [this](https://tio.run/##JYxNDsIgEIX3nGJ2A5W4cWNMvIem6QKRhCYtYplG5vQ4pG/xFt/7yUzxky7XvLVWwhfuMKJDC/jCSS2FBHR8EuyFqMpCklvDm/a8BI2PZ29XYDR6kKpRqyMfQbY3BSLvSoCx8rlaEGcLg5@OqCtvcyKNx@g3U5Q7b1r7Aw) with 3.10. We're not limited to literals. – Kelly Bundy Sep 03 '22 at 19:40
(bah, also eeds a non-matching case to demonstrate that it actually compares instead of capturing, but I gotta go) – Kelly Bundy Sep 03 '22 at 19:44
I would suggest the ultimate lesson here is that this is not an ideal use for structural pattern-matching. – Chris Sep 03 '22 at 19:45
Oh I just reread the Q. It does ask specifically about "foo" and "bar", not about a 2-list `seq`, so we could hardcode those literals. – Kelly Bundy Sep 03 '22 at 19:48
But yes, I wouldn't use SPM here, either. – Kelly Bundy Sep 03 '22 at 19:49

Structural pattern matching python - match at any position in sequence

2 Answers2