8

I'm getting unexpected backtracking of the + quantifier of a Raku regex.

In this regex:

'abc' ~~ m/(\w+) {say $0}  <?{ $0.substr(*-1) eq 'b' }>/;

say $0;

I get the expected result:

「abc」  # inner say
「ab」   # inner say

「ab」   # final say

That is, the (greedy) + quantifier gets all letters and then the conditional fails. After that it starts backtracking by freeing the last got letter, until the conditional evaluates to true.

However, the backtracking does not seem to work the same way when I put the quantifier outside the capturing group:

'abc' ~~ m/[(\w)]+ {say $0}  <?{ $0.tail eq 'b' }>/;

say $0;

Result:

[「a」 「b」 「c」]  # inner say
[「a」 「b」 「c」]  # why this extra inner say? Shouldn't this backtrack to [「a」 「b」]?
[「a」 「b」 「c」]  # why this extra inner say? Shouldn't this backtrack to [「a」 「b」]?
[「b」 「c」]      # Since we could not successfully backtrack, We go on matching by increasing the position
[「b」 「c」]      # Previous conditional fails. We get this extra inner say
[「c」]          # Since we could not successfully backtrack, We go on matching by increasing the position

Nil            # final say, no match because we could not find a final 'b'

Is this behaviour expected? If so: Why do they work differently? Is it possible to mimic the first regex but still keep the quantifier outside the capturing group?

NOTE:

Using a lazy quantifier 'solves' the issue... This is expected because the difference seems to happen with backtracking, and that does not happen with the lazy quantifier.

'abc' ~~ m/[(\w)]+? {say $0}  <?{ $0.tail eq 'b' }>/;

[「a」]
[「a」 「b」]

[「a」 「b」]

However for performance reasons I'd rather use a greedy quantifier (the example in this question is a simplification).

Boann
  • 48,794
  • 16
  • 117
  • 146
Julio
  • 5,208
  • 1
  • 13
  • 42
  • in your second example I don't understand the use of square brackets. If I put square brackets inside the parentheses I get the same result as parentheses alone, i.e. `([\w+])` and `([\w])+` mimic their non-square bracket containing counterparts. – jubilatious1 Dec 10 '20 at 19:15
  • 1
    @jubilatious1 Yes, the brackets are useless :). The original regex was more complex and I started to strip parts of it to get the simplest case. I finally forgot to remove the brackets. – Julio Dec 10 '20 at 20:43

1 Answers1

7

I don't think the issue is with backtraking. But looks like the intermediate $0 exposed has retained the previous iteration captures. Consider this expression,

'abc' ~~ m/[(\w)]+ {say "Match:",$/.Str,";\tCapture:",$0}  <?{ False }>/;

This is the output:

Match:abc;  Capture:[「a」 「b」 「c」]
Match:ab;   Capture:[「a」 「b」 「c」]
Match:a;    Capture:[「a」 「b」 「c」]
Match:bc;   Capture:[「b」 「c」]
Match:b;    Capture:[「b」 「c」]
Match:c;    Capture:[「c」]

As you can see the match is in proper order, abc ab a .... But the captured array for ab match is also [「a」 「b」 「c」]. I suspect this to be a bug.


For your case, there are a couple of approaches.

  1. Just use $/ for the condition check
    'abc' ~~ m/[(\w)]+  <?{ $/.Str.substr(*-1) eq 'b' }>/;
    
  2. Or, additionally also capture the group with quatifiers as well.
    'abc' ~~ m/([(\w)]+) <?{ $0[0][*-1] eq 'b' }>/;
    
    Here $0 matches the outer group, $0[0] matches the first inner group, $[0][*-1] matches the character that was finally matched in this iteration.

Prasanna
  • 2,390
  • 10
  • 11
  • 3
    Even easier than `$/.Str.substr(*-1) eq 'b'` is `$/.ends-with: 'b'` – user0721090601 Dec 10 '20 at 09:08
  • 1
    Hi! Your workaround of wrapping the quantizier with an extra capturing group works like a charm! Let's see what the rakudo people say about this is a bug or not – Julio Dec 10 '20 at 10:56
  • 2
    I've filed an issue [With `(foo)+` the corresponding sub-captures aren't removed during backtracking](https://github.com/rakudo/rakudo/issues/4105). – raiph Dec 11 '20 at 04:01
  • 1
    @raiph, Thank you – Prasanna Dec 11 '20 at 04:34
  • I'm not sure I understand the issue, @Prasanna and @raiph. If I eliminate backtracking with the `:r` racheting adverb, I see the expected result. Try `'abc' ~~ m:r/(\w)+ {say $/;} { False }>/;` , Thanks! – jubilatious1 Dec 11 '20 at 05:05
  • 2
    @jubilatious1, the regex given, is a simplified one for demonstrating the issue. Based on the demo output, hope you agree there is a discrepancy. And yes, the issue occurs only when we use backtracking and `$0`. – Prasanna Dec 11 '20 at 05:42
  • @Prasanna, possibly relevant SO answer: https://stackoverflow.com/a/64407931/7270649 – jubilatious1 Dec 11 '20 at 06:28