Lookaround regex and character consumption

Question

Based on the documentation for Raku's lookaround assertions, I read the regex / <?[abc]> <alpha> / as saying "starting from the left, match but do not not consume one character that is a, b, or c and, once you have found a match, match and consume one alphabetic character."

Thus, this output makes sense:

'abc' ~~ / <?[abc]> <alpha> /     # OUTPUT: «｢a｣␤ alpha => ｢a｣»

Even though that regex has two one-character terms, one of them does not capture so our total capture is only one character long.

But next expression confuses me:

'abc' ~~ / <?[abc\s]> <alpha> /     # OUTPUT: «｢ab｣␤ alpha => ｢b｣»

Now, our total capture is two characters long, and one of those isn't captured by <alpha>. So is the lookaround capturing something after all? Or am I misunderstanding something else about how the lookaround works?

What does it mean that your first example with a *negative lookaround* gives `Nil` return, i.e. `'abc' ~~ / <![abc]> /; #OUTPUT: Nil`, however, your second example with a _negative lookaround_ gives the same result as a _positive lookaround_: `'abc' ~~ / <![abc\s]> /; # OUTPUT: «｢ab｣␤ alpha => ｢b｣»` ? — jubilatious1, Sep 26 '21 at 02:13

Markus Jarderot · Accepted Answer · 2021-12-20T12:26:57.163

3

<?[ ]> and <![ ]> does not seem to support some backslashed character classes. \n, \s, \d and \w show similar results.

<?[abc\s]> behaves the same as <[abc\s]> when \n, \s, \d or \w is added.

\t, \h, \v, \c[NAME] and \x61 seem to work as normal.

edited Dec 20 '21 at 12:26

answered Dec 20 '21 at 11:36

Markus Jarderot

86,735
21
136
138

Do you mean to say, "`[abc]>` behaves the same whether-or-not `\n`, `\s`, `\d` or `\w` are added." ? – jubilatious1 Jan 05 '22 at 19:55
1

@jubilatious1 No. Without `\n`, `\s`, `\d` or `\d`, it works as it is supposed to. When you add `\n`, `\s`, `\d` or `\w`, it turns into `<[...]>`. – Markus Jarderot Jan 06 '22 at 16:50

Lookaround regex and character consumption

1 Answers1

Linked