5

We have one quite complex regular expression which checks for string structure.

I wonder if there is an easy way to find out which character in the string that is causing reg expression not to match.

For example,

 string.match(reg_exp).get_position_which_fails

Basically, the idea is how to get "position" of state machine when it gave up.

Here is an example of regular expression:

%q^[^\p{Cc}\p{Z}]([^\p{Cc}\p{Zl}\p{Zp}]{0,253}[^\p{Cc}\p{Z}])?$
user2196351
  • 539
  • 5
  • 13
  • 4
    What fails in `'abad'.match /ae/`? This problem simply has no adequate solution. – Aleksei Matiushkin May 22 '15 at 16:15
  • It may help if you could post the regular expression. – hwnd May 22 '15 at 16:16
  • You want to find how many characters are matched before the first failure. If the expression to match is simple enough, you could create a regex that will always match, and that will gather good groups on the way. Something like `/(f?)(a?)(i?)(l?)(h?)(e?)(r?)(e?)/` which will match the first 4 characters of "failNow". [Test](https://regex101.com/r/uM4vB7/1) – James Newton May 22 '15 at 16:20
  • see eg. http://stackoverflow.com/questions/2348694/how-do-you-debug-a-regex for debug of regexp – Fredrik Pihl May 22 '15 at 16:21
  • Well, if you don't mind, you can use http://rubular.com/ to try your regex – Uelb May 22 '15 at 16:22
  • @JamesNewton: your approach is wrong since it can match for example: "ahr" – Casimir et Hippolyte May 22 '15 at 16:25
  • @CasimirEtHippolyte: Sure, but then the first group will be empty, so you can see where the match that you were hoping to make would fail. My technique would require you analyze the result, not simply accept the match. – James Newton May 22 '15 at 16:33

1 Answers1

7

The short answer is: No.

The long answer is that a regular expression is a complicated finite state machine that may be in a state trying to match several different possible paths simultaneously. There's no way of getting a partial match out of a regular expression without constructing a regular expression that allows partial matches.

If you want to allow partial matches, either re-engineer your expression to support them, or write a parser that steps through the string using a more manual method.

You could try generating one of these automatically with Ragel if you have a particularly difficult expression to solve.

tadman
  • 208,517
  • 23
  • 234
  • 262