What's the difference between $/ and $¢ in regex?

Question

As the title indicates, what is the difference between $/ and $¢? They appear to always have the same value:

my $text = "Hello world";

$text ~~ /(\w+) { say $/.raku } (\w+)/;
$text ~~ /(\w+) { say $¢.raku } (\w+)/;

Both result in Match objects with the same values. What's the logic in using one over the other?

score 14 · Accepted Answer · answered Apr 27 '20 at 03:07

14

The variable $/ refers to the most recent match while the variable $¢ refers to the most recent outermost match. In most basic regexes like the above, that may be one and the same. But as can be seen from the output of the .raku method, Match objects can contain other Match objects (that's what you get when you use $<foo> or $1 for captures).

Suppose instead we had the following regex with a quantified capture

/ ab (cd { say $¢.from, " ", $¢.to } ) + /

And ran it would see the following output if we matched against "abcdcdcd":

0 2
0 4
0 6

But if we change from using $¢ to $/, we get a different result:

2 2
4 4
6 6

(The reason the .to seems to be a bit off is that it —and .pos— are not updated until the end of the capture block.)

In other words, $¢ will always refer to what will be your final match object (i.e., $final = $text ~~ $regex) so you can traverse a complex capture tree inside of the regex exactly as you would after having finished the full match So in the above example, you could just do $¢[0] to refer to the first match, $¢[1] the second, etc.

Inside of a regex code block, $/ will refer to the most immediate match. In the above case, that's the match for inside the ( ) and won't know about the other matches, nor the original start of the matching: just the start for the ( ) block. So give a more complex regex:

/ a $<foo>=(b $<bar>=(c)+ )+ d /

We can access at any point using $¢ all of the foo tokens by saying $¢<foo>. We can access the bar tokens of a given foo by using $¢<foo>[0]<bar>. If we insert a code block inside of foo's capture, it will be able to access bar tokens by using $<bar> or $/<bar>, but it won't be able to access other foos.

answered Apr 27 '20 at 03:07

user0721090601

5,276
24
41

1

Ohhh! I interpreted the doc's "The main difference between `$/` and `$¢` is scope: the latter only has a value inside the regex" to mean `$¢` was merely a vestigial trace, just as `Cursor` is. When I read your answer I thought `$¢` would be the `$*TOP` I created in the **A possible improvement?** section of [my answer](https://stackoverflow.com/a/56397290/1077672) to the SO "Why/how is an additional variable needed in matching repeated arbitary character with capture groups?". But my attempts to replace `$*TOP` with `$¢` failed. Do you understand my point in that answer? Can you make it work? – raiph Apr 27 '20 at 12:33
Raiph: So in grammars, `$¢` is renewed for each token, so you'd have to say `$*TOP := $¢` in the `TOP` token but that doesn't get rid of the need for the `$*TOP` var of course. I agree it would be awesome to be able to refer to matches at a top level. The problem is, ultimately, still the one you identify: when the positional/hash matches post to the match object. When using `$¢` — which is per-token — results will by definition post as soon as its enclosing `{ }` block is encountered. – user0721090601 Apr 27 '20 at 14:37
What's interesting to me is that in developing `Binex`, I haven't found it to be computationally any worse to post match results immediately upon encountering them. At the end of the day, you're pushing/popping either to a cached list/hash, or you're pushing/popping to the Match's list/hash. However, there may be some sort of internal speed up I'm not aware of used for LTM which is likely at the core of it (the `{ }` terminates a token for the purposes of LTM, and so is more likely to be run/tested than the rest of the token in a `|` grouping) – user0721090601 Apr 27 '20 at 14:40
Ahhh. I had jumped to the conclusion `$¢` was dynamic, and was surprised when it didn't work. But the penny's now dropped that it's lexical, as I could have guessed given your use of the word "outermost", and is, as you explain, established at the start of each rule. – raiph Apr 27 '20 at 15:51
So, iiuc, at the start of a rule, a new match object is created that records the matching engine's cursor position within the original input string, but is otherwise empty. (Right?) Then `$¢` and `$/` are bound to the same object, namely this new match object, which will record what this rule matches and captures as it progresses. Then, as matching progresses, `$¢` remains bound to this overall match object, whereas `$/` is rebound each time a new match object gets created, so it always corresponds, as you say, to the latest match object. Right? – raiph Apr 27 '20 at 16:07
raiph: It's weird, because `$¢` is actually defined as dynamic, despite functioning 100% as lexical (`$/` is also dynamic, but doesn't propogate either. AFAICT it's dynamic only so that the match calls on the regex can access it from lower levels and set it, via `$CALLERS::'/' := Match`). Your description of the binding process for $/ and $¢ is spot on. – user0721090601 Apr 27 '20 at 16:37
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/212639/discussion-between-raiph-and-user0721090601). – raiph Apr 27 '20 at 17:57

What's the difference between $/ and $¢ in regex?

1 Answers1

Linked