2

When i use the regex IID:\s*\d*0 to match ID: 12344y the regexbuddy give me the result

https://raw.github.com/litsand/litsand.github.com/master/_posts/pic/4.png

it backtrack \d* to find the match ,but don't backtrack the \s*

when i change the regex to ID:\s*\d*q ,it don't backtrack anymore.and give me the fail message.

https://raw.github.com/litsand/litsand.github.com/master/_posts/pic/5.png

I know even if it backtrack ,finally the regex would give me a fail message. But how the regexbuddy knew it would fail and don't backtrack?

I read the Mastering Regular Expressions and don't find any answer. thanks for your help.

Sorry for the pictures,I don't have the right to upload image.

litsand
  • 23
  • 3

2 Answers2

2

RegexBuddy's regex engine internally optimizes your regular expressions into ID:\s*+\d*0 and ID: \s*+\d+q using possessive quantifiers. It can do this because \s and \d are mutually exclusive, as are \d and q. Mastering Regular Expressions calls this "automatic possessification".

In RegexBuddy 3, the regex debugger also uses this optimization. That's why you didn't see the backtracking steps in the debugger. In RegexBuddy 4, the regex debugger has all optimizations disabled. In RegexBuddy 4 the debugger will show all the backtracking your regular expression does in a regex engine that doesn't have "automatic possessification".

Jan Goyvaerts
  • 21,379
  • 7
  • 60
  • 72
1

I'm guessing that can be some optimizations not illustrated properly in the application.

For example in Perl both expressions will fail instantly by the optimizer. Giving the following output:

C:\>perl -Mre=debug -e"'ID: 12344y'=~/ID:\s*\d*0/"
Compiling REx "ID:\s*\d*0"
Final program:
   1: EXACT <ID:> (3)
   3: STAR (5)
   4:   SPACE (0)
   5: STAR (7)
   6:   DIGIT (0)
   7: EXACT <0> (9)
   9: END (0)
anchored "ID:" at 0 floating "0" at 3..2147483647 (checking anchored) minlen 4
Guessing start of match in sv for REx "ID:\s*\d*0" against "ID: 12344y"
Found anchored substr "ID:" at offset 0...
Contradicts floating substr "0", giving up...
Match rejected by optimizer
Freeing REx: "ID:\s*\d*0"

The optimizer checks for presence of ID: and 0, but doesn't find 0 and rejects the match before even executing the compiled expression. The same happens with the second example.

Qtax
  • 33,241
  • 9
  • 83
  • 121