Reg ex partial fail

Question

Bear with me on this as I might not be explaining this too well.

I have a simple regex:

^(The\s)?(cat\s)?(sat\s)?(on\s)?(the\s)?(mat\.)?

To which the text

The cat sat on the mat.

passes successfully. Hurrah!

However, what I'm after is a way to find out which groups the regex failed on. For example:

The cat sat on the mat # fails on group 6 (no period)
The cat sat on teh mat. # fails on group 5 (teh instead of the)
The kat sat on the mat. # fails on group 2 (kat instead of cat)

The latter example was otherwise fine except for that one group fail. My question is this: Is there a way in Python to determine if that string would have been otherwise successful on a group by group basis - without having to create iterations of the regex fir each group in part?

regex101 now with added cats

Why don't you just use something like [`if word in string`](https://stackoverflow.com/questions/5319922/python-check-if-word-is-in-a-string) — ctwheels, Mar 20 '18 at 14:25
Why not counting total groups? If total counts are one then your fail group would be `1` + 1. — revo, Mar 20 '18 at 14:27
@ctwheels this is basic example for clarity, The real deal is much more complex regex, trust me. — Ghoul Fool, Mar 20 '18 at 14:31
@GhoulFool then you should present an equivalent example. At the moment, the best approach is to use `if word in string`. Also, letting us know *what* you're trying to accomplish will help us provide you with the best way to tackle your problem. — ctwheels, Mar 20 '18 at 14:32
Why do you need to use a regex at all? (are you given a regex as input?) — user202729, Mar 20 '18 at 14:32
Software Engineering SE: [When you should NOT use Regular Expressions?](https://softwareengineering.stackexchange.com/questions/113237/when-you-should-not-use-regular-expressions) — user202729, Mar 20 '18 at 14:33
@revo - but if you count groups then the three examples would all be the same amount (5/6) rather than the failed group in question? — Ghoul Fool, Mar 20 '18 at 14:33
No they are not the same. I mean total captured groups not total defined groups. — revo, Mar 20 '18 at 14:35

score 0 · Answer 1 · answered Mar 20 '18 at 14:42

If you just want to know where the first failure occurred, you can use re.findall()

import re

regex = r'^(The\s)?(cat\s)?(sat\s)?(on\s)?(the\s)?(mat\.)?'
text = ''The cat sat on teh mat.'

re.findall(regex, text)
# [('The ', 'cat ', 'sat ', 'on ', '', '')]

So you can find out the index of the first failure by doing:

re.findall(regex, text)[0].index('')
# 4

(Note this approach may not be useful if you have overlapping matches, backtracking or other more unusual patterns in your regex).

Reg ex partial fail

1 Answers1