4

This program

say "zipi zape" ~~ /(\w)\w» \s+ \w+({$0})/;

returns

「pi zape」
 0 => 「p」
 1 => 「」

which I interpret as the backreference to the first match being matched to a zero-width match? Maybe because it's matched to $0, which is itemized to '' outside the regex? How could I use these backreferences, and capture at the same time the match? Note: this related to this documentation issue, which requires clarification of the use of backreferences.

Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
jjmerelo
  • 22,578
  • 8
  • 40
  • 86
  • 2
    At the time of writing this all the issues raised and questions asked in your original question **and** in your comments on Håkon's answer are carefully explained in [my answer to "Why/how is an additional variable needed in matching repeated arbitary character with capture groups?"](https://stackoverflow.com/a/56397290/1077672). Do you think it's worth walking thru your version of the problem and questions? I will happily write an answer if you read my linked answer and think yours is different but for now I think it's essentially a duplicate. – raiph Jul 07 '19 at 12:10

2 Answers2

4

According to the documentation:

If you need to refer to a capture from within another capture, store it in a variable first

So you could use:

say "zipi zape" ~~ /(\w){} :my $c = $0; \w » \s+ \w+($c)/;

Output:

「pi zap」
 0 => 「p」
 1 => 「p」
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
  • Actually, I wanted to understand above behavior. But there's an additional thing I kind of not-understand; why do you use open and closing braces behind the first capture? – jjmerelo Jul 07 '19 at 10:15
  • 1
    It seems this usage of `{}` is only mentioned in a [comment](https://github.com/perl6/doc/blob/master/doc/Language/regexes.pod6#L1088): it says we need it to update the current match object. – Håkon Hægland Jul 07 '19 at 10:23
  • 1
    Hi @jjmerelo "I wanted to understand above behavior." In brief, `(...)` is shorthand for creating a new `Match` object and setting `$/` to that new object inside the parens. So your second pair of parens means that the `$0` inside them does *not* refer back to the (capture from the) first pair of parens. – raiph Jul 07 '19 at 12:21
  • 1
    @jjmerelo "why do you use open and closing braces behind the first capture" I've explained this (both in summary and in great depth) in prior SOs; see my comment on your question as a starting point. – raiph Jul 07 '19 at 12:22
  • @raiph Thanks for the reference to the other question! – Håkon Hægland Jul 07 '19 at 12:31
3

{$0} isn't a backreference.
It is a code block.
In this case, it is a code block which does absolutely nothing.

In order for it to actually be used as a part of the regex it needs <> around it.


In fact since () denotes something like a new closure in respect of $/, it would be an empty regex if it was actually being used for something.
($/ is reset for every (), so $0 is also reset.)

say "zipi zape" ~~ /(\w)\w» \s+ \w+(<{$0}>)/;
Cannot resolve caller INTERPOLATE_ASSERTION(Match:D: Nil:U, BOOTInt, BOOTInt, BOOTInt, BOOTInt, PseudoStash:D); none of these signatures match:
    (Match: Associative:D, $, $, $, $, $, *%_)
    (Match: Iterable:D \var, int \im, int \monkey, int \s, $, \context, *%_)
    (Match: Mu:D \var, int \im, int \monkey, $, $, \context, *%_)
  in block <unit> at <unknown file> line 1

That happens because it is basically the same as (<{Nil}>).


What you could do is update $/ before the second () by using {}, and use double quotes around $0

say "zipi zape" ~~ /(\w){}\w» \s+ \w+("$0")/;
「pi zap」
 0 => 「p」
 1 => 「p」

Too me this seems a little unreliable.
(It is relying on what I would consider a mis-feature, if not an outright bug.)


This is where we get to Håkon Hægland's answer of storing it in a lexical variable.
(After updating $/ by using {}.)

say "zipi zape" ~~ /(\w){} :my $c = $0; \w » \s+ \w+($c)/;

Lexical variables are not scoped to (), so it is perfectly safe to do this.

I would personally stringify $0 since that is the only part of the match object inside of $0 that is being used.

say "zipi zape" ~~ /(\w){} :my $c = ~$0; \w » \s+ \w+($c)/;

Honestly I don't see a reason to even capture the second match, since it is always going to be the same as the first match.

say "zipi zape" ~~ /(\w)\w» \s+ \w+$0/;

I also see little point in adding » since the \s+ already forces it to be the end of a word.

say "zipi zape" ~~ /(\w)\w \s+ \w+$0/;
Brad Gilbert
  • 33,846
  • 11
  • 78
  • 129