1

Bear with me on this as I might not be explaining this too well.

I have a simple regex:

^(The\s)?(cat\s)?(sat\s)?(on\s)?(the\s)?(mat\.)?

To which the text

The cat sat on the mat.

passes successfully. Hurrah!

However, what I'm after is a way to find out which groups the regex failed on. For example:

The cat sat on the mat # fails on group 6 (no period)
The cat sat on teh mat. # fails on group 5 (teh instead of the)
The kat sat on the mat. # fails on group 2 (kat instead of cat)

The latter example was otherwise fine except for that one group fail. My question is this: Is there a way in Python to determine if that string would have been otherwise successful on a group by group basis - without having to create iterations of the regex fir each group in part?

regex101 now with added cats

Ghoul Fool
  • 6,249
  • 10
  • 67
  • 125
  • 3
    Why don't you just use something like [`if word in string`](https://stackoverflow.com/questions/5319922/python-check-if-word-is-in-a-string) – ctwheels Mar 20 '18 at 14:25
  • Why not counting total groups? If total counts are one then your fail group would be `1` + 1. – revo Mar 20 '18 at 14:27
  • @ctwheels this is basic example for clarity, The real deal is much more complex regex, trust me. – Ghoul Fool Mar 20 '18 at 14:31
  • 1
    @GhoulFool then you should present an equivalent example. At the moment, the best approach is to use `if word in string`. Also, letting us know *what* you're trying to accomplish will help us provide you with the best way to tackle your problem. – ctwheels Mar 20 '18 at 14:32
  • Why do you need to use a regex at all? (are you given a regex as input?) – user202729 Mar 20 '18 at 14:32
  • Software Engineering SE: [When you should NOT use Regular Expressions?](https://softwareengineering.stackexchange.com/questions/113237/when-you-should-not-use-regular-expressions) – user202729 Mar 20 '18 at 14:33
  • @revo - but if you count groups then the three examples would all be the same amount (5/6) rather than the failed group in question? – Ghoul Fool Mar 20 '18 at 14:33
  • No they are not the same. I mean total captured groups not total defined groups. – revo Mar 20 '18 at 14:35

1 Answers1

0

If you just want to know where the first failure occurred, you can use re.findall()

import re

regex = r'^(The\s)?(cat\s)?(sat\s)?(on\s)?(the\s)?(mat\.)?'
text = ''The cat sat on teh mat.'

re.findall(regex, text)
# [('The ', 'cat ', 'sat ', 'on ', '', '')]

So you can find out the index of the first failure by doing:

re.findall(regex, text)[0].index('')
# 4

(Note this approach may not be useful if you have overlapping matches, backtracking or other more unusual patterns in your regex).

match
  • 10,388
  • 3
  • 23
  • 41