How to use matching delimiters in Raku

Question

I'm trying to write a token that allows nested content with matching delimiters. Where (AB) should result in a match to at least "AB" if not "(AB)". And (A(c)B) would return two matches "(A(c)B)" and so on.

Code boiled down from its source:

#!/home/hsmyers/rakudo741/bin/perl6
use v6d;

my @tie;

class add-in {
    method tie($/) { @tie.push: $/; }
}

grammar tied {
    rule TOP { <line>* }
    token line {
        <.ws>?
        [
            | <tie>
            | <simpleNotes>
        ]+
        <.ws>?
    }
    token tie {
        [
            || <.ws>? <simpleNotes>+ <tie>* <simpleNotes>* <.ws>?
            || <openParen> ~ <closeParen> <tie>
        ]+
    }
    token openParen { '(' }
    token closeParen { ')' }
    token simpleNotes {
        [
            | <[A..Ga..g,'>0..9]>
            | <[|\]]>
            | <blank>
        ]
    }
}

my $text = "(c2D) | (aA) (A2 | B)>G A>F G>E (A,2 |\nD)>F A>c d>f |]";

tied.parse($text, actions => add-in.new).say;
$text.say;
for (@tie) {
    s:g/\v/\\n/;
    say "«$_»";
}

This gives a partially correct result of:

«c2D»
«aA»
«(aA)»
«A2 | B»
«\nD»
«A,2 |\nD»
«(A,2 |\nD)>F A>c d>f |]»
«(c2D) | (aA) (A2 | B)>G A>F G>E (A,2 |\nD)>F A>c d>f |]»

BTW, I'm not concerned about the newline, it is there only to check if the approach can span text over two lines. So stirring the ashes I see captures with and without parenthesis, and a very greedy capture or two.

Clearly I have a problem within my code. My knowledge of perl6 can best be described as "beginner" So I ask for your help. I'm looking for a general solution or at least an example that can be generalized and as always suggestions and corrections are welcome.

cf https://stackoverflow.com/questions/54940877/recursive-regular-expression-in-perl-6 — raiph, Jan 17 '20 at 01:21
`Where (AB) should result in a match to at least "AB" if not "(AB)".` I'm not sure what that means. Should (AB) return two matches or one? `And (A(c)B) would return two matches "(A(c)B)" and so on."` Even more confusing. What two matches? `This gives a partially correct result of:` Are you saying *all* the matches shown are good but some are missing or that *some* are correct but others aren't? — raiph, Jan 17 '20 at 01:37
If you want to receive two match objects, unfortunately, the way that grammars are designed you can only get a single match. I remember someone did a presentation about returning multiple matches but I can't remember what his library name was. — user0721090601, Jan 17 '20 at 02:52
Since at least one return had the parenthesis included, I fully expected a series of matches with them included. I find the fact that I get matches with and without confusing. Regards "What two matches," There are two sets of parentheses in (A(c)B), so I would expect a match for (c) and a match for (A(c)B). As for (AB), if parentheses are included as had been my intention, then the match would be (AB). I did not mean two returns at once but one after the other. — hsmyers, Jan 17 '20 at 04:19

score 7 · Accepted Answer · answered Jan 17 '20 at 05:19

There are a few added complexities that you have. For instance, you define a tie as being either (...) or just the .... But that inner contents is identical to the line.

Here's a rewritten grammar that greatly simplifies what you want. When writing grammars, it's helpful to start from the small and go up.

grammar Tied {
    rule  TOP   { <notes>+ %% \v+ }
    token notes {
        [
        | <tie>
        | <simple-note>
        ] + 
        %%
        <.ws>?
    }
    token open-tie    { '(' }
    token close-tie   { ')' }
    token tie         { <.open-tie> ~ <.close-tie> <notes> }
    token simple-note { <[A..Ga..g,'>0..9|\]]>             }
}

A few stylistic notes here. Grammars are classes, and it's customary to capitalize them. Tokens are methods, and tend to be lower case with kebap casing (you can of course use any type you want, though). In the tie token, you'll notice that I used <.open-tie>. The . means that we don't need to capture it (that is, we're just using it for matching and nothing else). In the notes token I was able to simplify things a lot by using the %% and making TOP a rule which auto adds some whitespace.

Now, the order that I would create the tokens is this:

<simple-note> because it's the most base level item. A group of them would be
<notes>, so I make that next. While doing that, I realize that a run of notes can also include a…
<tie>, so that's the next one. Inside of a tie though I'm just going to have another run of notes, so I can use <notes> inside it.
<TOP> at last, because if a line just has a run of notes, we can omit line and use %% \v+

Actions (often given the same name as your grammar, plus -Actions, so here I use class Tied-Actions { … }) are normally used to create an abstract syntax tree. But really, the best way to think of this is asking each level of the grammar what we want from it. I find that whereas writing grammars it's easiest to build from the smallest element up, for actions, it's easiest to go from the TOP down. This will also help you build more complex actions down the road:

What do we want from TOP?
In our case, we just want all the ties that we found in each <note> token. That can be done with a simple loop (because we did a quantifier on <notes> it will be Positional:
method TOP ($/) { my @ties; @ties.append: .made for $<notes>; make @ties; }
The above code creates our temp variable, loops through each <note> and appends on everything that <note> made for us — which is nothing at the moment, but that's okay. Then, because we want the ties from TOP, so we make them, which allows us to access it after parsing.
What do you want from <notes>?
Again, we just want the ties (but maybe some other time, you want ties and glisses, or some other information). So we can grab the ties basically doing the exact same thing:
method notes ($/) { my @ties; @ties.append: .made for $<tie>.grep(*.defined); make @ties; }
The only differences is rather than doing just for $<tie>, we have to grab just the defined ones — this is a consequence of doing the [<foo>|<bar>]+: $<foo> will have a slot for each quantified match, whether or note <foo> did the matching (this is when you would often want to pop things out to, say, proto token note with a tie and a simple note variant, but that's a bit advaned for this). Again, we grab the whatever $<tie> made for us — we'll define that later, and we "make" it. Whatever we make is what other actions will find made by <notes> (like in TOP).
What do you want from <tie>? Here I'm going to just go for the content of the tie — it's easy enough to grab the parentheses too if you want. You'd think we'd just use make ~$<notes>, but that leaves off something important: $<notes> also has some ties. But those are easy enough to grab:
method tie ($/) { my @ties = ~$<notes>; @ties.append: $<notes>.made; make @ties; }
This ensures that we pass along not only the current outer tie, but also each individual inner tie (which in turn may haev another inner one, and so on).

When you parse, all you need to do is grab the .made of the Match:

say Tied.parse("a(b(c))d");
# ｢a(b(c))d｣
# notes => ｢a(b(c))d｣
#  simple-note => ｢a｣
#  tie => ｢(b(c))｣          <-- there's a tie!
#   notes => ｢b(c)｣
#    simple-note => ｢b｣
#    tie => ｢(c)｣           <-- there's another!
#     notes => ｢c｣
#      simple-note => ｢c｣
#  simple-note => ｢d｣
say Tied.parse("a(b(c))d", actions => TiedActions).made;
# [b(c) c]

Now, if you really only will ever need the ties —and nothing else— (which I don't think is the case), you can things much more simply. Using the same grammar, use instead the following actions:

class Tied-Actions {
    has @!ties;
    method TOP ($/) { make @!ties            }
    method tie ($/) { @!ties.push: ~$<notes> }
}

This has several disadvantages over the previous: while it works, it's not very scalable. While you'll get every tie, you won't know anything about its context. Also, you have to instantiate Tied-Actions (that is, actions => TiedActions.new), whereas if you can avoid using any attributes, you can pass the type object.

A ton of material to digest! The new knowledge is part of my reason for taking on perl6 after 20 years of perl(5) so I appreciate what you have imparted. For me, programming is about solving problems and learning new things. So off I go to do just—learn… — hsmyers, Jan 17 '20 at 06:35
hsmyers: whenever I get stuck on my grammars, restructuring them can work the best. “redflags” for me (things that probably need to be refactored) are tokens containing non-semantically-similar things (they should probably become tokens themselves), any repetition of regex syntax for similar functions (they should probably be upgraded to tokens), and tokens containing only another token (or repetition of a token). *Sometimes* there's a good reason for any of these (normally reflected in how they need to be handled by actions) but they should definitely make you think twice if you encounter — user0721090601, Jan 21 '20 at 06:02
This is from my 4th rewrite with numerous side-paths! But each one gets better as I learn what I'm doing. It is somewhat frustrating to know what I want to do but not to know how perl6 does it! Thanks for the flag list!! — hsmyers, Jan 22 '20 at 23:24

How to use matching delimiters in Raku

1 Answers1