10

I'm trying to implement a Markdown parser with Perl6 grammar and got stuck with blockquotes. A blockquote paragraph cannot be expressed in terms of nested braces because it is a list of specifically formatted lines. But semantically it is a nested markdown.

Basically, it all came down to the following definition:

    token mdBlockquote {
        <mdBQLine>+ {
            my $quoted = [~] $m<mdBQLine>.map: { $_<mdBQLineBody> };
        }
    }

The actual implementation of mdBQLine token is not relevant here. The only imporant thing to note is that mdBQLineBody key contains actually quoted line with > stripped off already. After all, for a block:

> # quote1
> quote2
>
> quote3
quote3.1

the $quoted scalar will contain:

# quote1
quote2

quote3
quote3.1

Now, the whole point is to have the above data parsed and injected back into the Match object $/. And this is where I'm totally stuck with no idea. The most apparent solution:

    token mdBlockquote {
        <mdBQLine>+ {
            my $quoted = [~] $m<mdBQLine>.map: { $_<mdBQLineBody> };
            $<mdBQParsed> = self.parse( $quoted, actions => self.actions );
        }
    }

Fails for two reasons at once: first, $/ is a read-only object; second, .parse modifies it effectively making it impossible to inject anything into the original tree.

Is there any solution then post-analysing the parsed data, extracting and re-parsing blockquotes, repeat...?

Pat
  • 36,282
  • 18
  • 72
  • 87
Vadim Belman
  • 1,210
  • 6
  • 15
  • 4
    Have you tried to use [grammar actions](https://docs.perl6.org/language/grammars#Action_Objects) combined with [`make`](https://docs.perl6.org/routine/make). Also note that `tokens` can be used recursively, see for example [Parsing a possibly nested braced item using a grammar](https://stackoverflow.com/q/47124405/2173773) – Håkon Hægland Jul 16 '18 at 05:35

2 Answers2

6

Expanding a little on @HåkonHægland's comment...

$/ is a read-only object ... effectively making it impossible to inject anything into the original tree.

Not quite:

  • Pedantically speaking, $/ is a symbol and never an object whether or not it's bound to one. If it's a parameter (and not declared with is rw or is copy), then it's read-only but otherwise it can be freely rebound, eg. $/ := 42.

  • But what you're referring to is assignment to a key. The semantics of assignment is determined by the item(s) being assigned to. If they're ordinary objects that are not containers then they won't support lvalue semantics and you'll get a Cannot modify an immutable ... error if you try to assign to them. A Match object is immutable in this sense.

What you can do is hang arbitrary data off any Match object by using the .make method on it. (The make routine calls this method on $/.) This is how you store custom data in a parse tree.

To access what's made in a given node of a parse tree / Match object, call .made (or .ast which is a synonym) on that node.

Typically what you make for higher nodes in a parse tree includes what was made for lower level nodes.

Please try the following untested code out and see what you get, then comment if it fails miserably and you can't figure out a way to make it work, or build from there taking the last two paragraphs above into consideration, and comment on how it works out:

token mdBlockquote {
    <mdBQLine>+ {
        make .parse: [~] $m<mdBQLine>.map: { $_<mdBQLineBody> };
    }
}
raiph
  • 31,607
  • 3
  • 62
  • 111
  • Thanks @raiph! Your proposal was one on my list to get around the issue. What I don't like about this approach is that I was thinking about a possibility of plugging an action from a 3rd party. For that purpose It would be much cleaner without imposing additional processing requirements on them like 'a blockquote would set `.ast` and you must fetch it prior to continue'. Basically, preserving old `$/`, parsing and storing self-made AST from my actions works, but I hoped to see a way to call a rule and pass a text into it. – Vadim Belman Jul 16 '18 at 18:04
  • Maybe have third parties declare their actions in a role. Then wrap the methods in their role with an opening `callsame` and construct the modified role with wrapped methods as a class that inherits from your actions class. I think that'll work and also that there'll be a more elegant way to do what you want but I recommend you first get something that parses correctly without worrying about how you'll handle third-party actions. – raiph Jul 16 '18 at 18:36
  • 1
    Use of roles could be a way to do it. Anyway, you're right with the point that third-party actions are no close future anyway. It's just my habbit of planning more than just one step ahead. Otherwise the approach with pre-parsing in the rule action and later fetching in an action method works great and even manages nested quotes like a charm. It only required to pre-save `$/` in a scalar in the rule to use it later for `.make`. Thanks again for your help! – Vadim Belman Jul 16 '18 at 23:04
  • 1
    I played a bit with wrapping, hit bugs, remembered [this](https://rt.perl.org/Public/Bug/Display.html?id=129096#txn-1420146), gave up. So scrap wrapping. Instead maybe a `markdown-actions` class with a `makemore` sub that appends to whatever `markdown-actions` methods `make`, plus boilerplate instructions for third party action classes requiring they be declared `is markdown-actions` with each method being something like `callsame; makemore foo`. Then when that's working, use [MOP](https://docs.perl6.org/language/mop#index-entry-MOP) programming to eliminate the boilerplate. – raiph Jul 17 '18 at 23:03
5

Ok, here is the final solution I used. The grammar rule looks like this:

    token mdBlockquote {
        <mdBQLine>+ {
            my $m = $/;
            my $bq-body =  [~] $m<mdBQLine>.map( { $_<mdBQLineBody> } ); 
            $m.make(
                self.WHAT.parse(
                    $bq-body,
                    actions => self.actions.clone,
                )
            );
        }
    }

Important tricks here are backing up of $/ in $m because .parse will replace it.

Blockquote body is prefetched into $bq-body before calling .parse because there was a confusing side-effect if the expression is passed directly as an argument.

.parse is called on self.WHAT to avoid messing up with current grammar object.

This rule will end up with $m.ast containing a Match object which in turn would contain actions-generated data. Corresponding actions method then does the following:

    method mdBlockquote ($m) {
        my $bq = self.makeNode( "Blockquote" );
        $bq.push( $m.ast.ast );
        $m.make( $bq );
    }

Since the actions object builts a streamlined AST suitable for simple translation of markdown into other formats, what happens here is it fetches a brach of that tree generated by a recursive .parse and engrafts it into the main tree.

That is great is that the code supports nested blockquotes out of the box, no special handling is needed. What is not good is that it is still a lot of extra code whereas something like:

    token mdBlockquote {
        <mdBQLine>+ $<mdBQBody>={
            my $bq-body =  [~] $<mdBQLine>.map( { $_<mdBQLineBody> } ); 
            self.WHAT.parse(
                $bq-body,
                actions => self.actions.clone,
            );
        }
    }

whould look way better and won't require actions object intervention beyond its normal duties.

Vadim Belman
  • 1,210
  • 6
  • 15