1

How to match strings of the form

"a"*m + "b"*n

With the constraint that m > n > 0.

Example matches:

aab
aaabb
aaaaaaaaaaaaaaaabbb

Example non-matches (violating the m > n constraint):

abb
aabb
aaaabbbb

I was able to do this in perl by using a recursive subpattern. But in Python that feature doesn't work:

>>> re.match("^a+(a(?1)?b)$", "aaabb")
error: unknown extension ?1 at position 6

Is there any way to do this in stdlib Python re module, or is there another pattern possible which doesn't require an external PCRE library?

brian d foy
  • 129,424
  • 31
  • 207
  • 592
wim
  • 338,267
  • 99
  • 616
  • 750
  • 1
    Did you try using the [PyPi regex module](https://pypi.org/project/regex/)? – The fourth bird Dec 21 '20 at 19:02
  • 1
    @Thefourthbird Yes, but the question asks about stdlib Python `re` ("_...which doesn't require an external PCRE library_") – wim Dec 21 '20 at 19:06
  • @wim You are right, I was too fast with my comment. Not sure if that is possible. – The fourth bird Dec 21 '20 at 19:09
  • "No" is an acceptable answer, if it comes with evidence that recursive pattern is _required_ here and Python `re` can't match this some other way with a single pattern (no pre-processing) – wim Dec 21 '20 at 19:18
  • I don't think this can be done in a single regexp. I don't think there's any way to make the number of repetitions in `{x,y}` variable, or to capture the number of repetitions of a pattern (to implement the `m > n` relationship). – Barmar Dec 21 '20 at 19:27
  • 1
    There's a language agnostic solution [here](https://stackoverflow.com/a/3644267/589924), but it doesn't work in Python – ikegami Dec 21 '20 at 19:27
  • Is the expression actually regular? – Dani Mesejo Dec 21 '20 at 19:50
  • @DaniMesejo I think it is not regular, but neither are modern re engines – wim Dec 21 '20 at 19:51
  • 2
    Perhaps like this `^a+(?:a(?=a*(\1?b)))+\1$` https://regex101.com/r/qHkBdy/1 – The fourth bird Dec 21 '20 at 19:53
  • The practical context in which this came up was [AoC day 19: Monster Messages](https://github.com/wimglenn/advent-of-code-wim/blob/0949da9f07787386ddee2e7f9370aa21ab53e153/aoc_wim/aoc2020/q19.py#L48-L50), where 'a' and 'b' are not single characters but actually much larger capture groups. – wim Dec 21 '20 at 20:22

1 Answers1

1

One option might be to match 1 or more a's at the beginning to match more a's than b's, and use a repeating group 1+ more times to match at least a single b to match the a > b rule.

^a+(?:a(?=a*(\1?b)))+\1$

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Seems sre (and `regex` too) cannot compile this pattern, I get `error: cannot refer to an open group at position 13`. – wim Dec 21 '20 at 20:10
  • @wim Ah I see, I tested it only on regex101 in Python. Not sure if there is a way around that. Let me check. – The fourth bird Dec 21 '20 at 20:14
  • This is probably implemented in a different way in PyPi regex. Try passing the `regex.V1` flag and see the results, they look somewhat different than the PCRE results. – Wiktor Stribiżew Dec 21 '20 at 22:11
  • @WiktorStribiżew I tried that using [regex.V1](https://tio.run/##lVJBCsIwELznFUvwkGgRijeh9AcePRVlq1ELNg3bLejra1urKCaiewozs5kJGXflU2UXbVuUriIGMkdzEcIhsyELCZDc4EylS1RpglOVxWmutZ5l8UQKwabmbc3U6ZREzDMrBbxMh2HuRd8mD2hCuNfoH/OQ3Z8pvuX7IDxSH@R9WjDYJxHMhD/6@7I/v6kbqYUokXcnU/f16PsyPxR2j@ezGmsTwaMY0ShYx93WoSIYNldNGd1PUFgwtikNIRs1XhtBzUicxHo5xHBUWL6T8yNVjVNat@0N) but I don't get any output, maybe I made a mistake. I read [this page](https://stackoverflow.com/questions/11511083/kodos-cannot-refer-to-open-group), I think it is what is stated about that the group is not closed yet. – The fourth bird Dec 21 '20 at 22:30
  • I tried `regex.match(r'^a+(?:a(?=a*(\1?b)))+\1$', s, regex.V1)` and it yields false for `aaabb` – Wiktor Stribiżew Dec 21 '20 at 22:32