How to cache and use the cached regexes in perl6 grammar?

Question

My code spends a lot of time on regex interpolation. As the patterns rarely change, I guess caching these generated regexes should speed up the code. But I cannot figure out a right way to cache and use the cached regexes.

The code is used to parse some arithmetric expressions. As the users are allowed to define new operators, the parser must be ready to add new operators to the grammar. So the parser use a table to record these new operators and generate regexes from the table on the fly.

#! /usr/bin/env perl6

use v6.c;

# the parser may add new operators to this table on the fly.
my %operator-table = %(
    1 => $['"+"', '"-"'],
    2 => $['"*"', '"/"'],
    # ...
);

# original code, runnable but slow.
grammar Operator {
    token operator(Int $level) {
        <{%operator-table{$level}.join('|')}>
    }

    # ...
}

# usage:
say Operator.parse(
    '+',
    rule => 'operator',
    args => \(1)
);
# output:
# ｢+｣

Here are some experiments:

# try to cache the generated regexes but not work.
grammar CachedOperator {
    my %cache-table = %();

    method operator(Int $level) {
        if (! %cache-table{$level}) {
            %cache-table.append(
                $level => rx { <{%operator-table{$level}.join('|')}> }
            )
        }

        %cache-table{$level}
    }
}

# test:
say CachedOperator.parse(
    '+',
    rule => 'operator',
    args => \(1)
);
# output:
# Nil

# one more try
grammar CachedOperator_ {
    my %cache-table = %();

    token operator(Int $level) {
        <create-operator($level)>
    }

    method create-operator(Int $level) {
        if (! %cache-table{$level}) {
            %cache-table.append(
                $level => rx { <{%operator-table{$level}.join('|')}> }
            )
        }

        %cache-table{$level}    
    }
}

# test:
say CachedOperator_.parse(
    '+',
    rule => 'operator',
    args => \(1)
);
# compile error:
# P6opaque: no such attribute '$!pos' on type Match in a Regex when trying to get a value

"# output: Nil" See [my SO answer about debugging grammars](https://stackoverflow.com/a/19640657/1077672). "# compile error: P6opaque: no such attribute '$!pos' on type Match in a Regex when trying to get a value" I haven't understood your code or tried to run it yet. But at a first glance it looks like your code is returning a regex from a method in a grammar. But [the API is to return a Match object](https://stackoverflow.com/a/44941425/1077672), the result of applying a regex. Maybe I'm misunderstanding. I hope to try run it later tonight and maybe I'll make more sense of it then. — raiph, Jan 19 '19 at 18:46
@raiph Yeah, I want to return a regex. It seems that I misunderstand the grammar. — lovetomato, Jan 20 '19 at 00:25
Finally got back to this but I'm too tired to work on it tonight. Hopefully I'll have some time tomorrow. In the meantime, maybe post another SO about what it would take to run P6 in Python, or better still, to run NQP in Python. (Better because NQP/nqp will be a lot faster/leaner and probably a lot more interesting to a lot more Python folk because it's basically a P6 regex and 6model engine kinda like the P6 equivalent of [PCRE](https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions) and a [CLOSish](https://en.wikipedia.org/wiki/Common_Lisp_Object_System) engine rolled into one). — raiph, Jan 20 '19 at 01:04

score 4 · Answer 1 · answered Jan 19 '19 at 18:27

The following doesn't directly answer your question but may be of interest.

User defined operators

The following code declares an operator in P6:

sub prefix:<op> ($operand) { " $operand prefixed by op" }

Now one can use the new operator:

say op 42; # 42 prefixed by op

A wide range of operator positions and arities are covered, including choice of associativity and precedence, parentheses for grouping, etc. So maybe this is an appropriate way to implement what you're implementing.

Although it's slow, it might be fast enough. Additionally, as Larry said in 2017 ...

we know some some places in the parser that are slower than they should be, for instance ... various lexers relook at various characters in your Perl 6 program, it averages 5 or 6 times on every character, which is obviously deeply sub-optimal, and we know how to fix it

... and with luck Jonathan will work on the P6 grammar parser this year.

DSLs and Slangs

Even if you aren't interested in using the main language's ability to declare user defined operators, or can't for some reason, the underlying mechanisms that make it work might be of interest/use. Here are some references:

Brian Duggan's Informal DSLs presentation (video, slides).
Mouq's 2014 gist Slangs.
Larry Wall's speculation from way back when in Switching parsers and Slangs.

Thanks a lot for your suggestions. Actually I want to modify the grammar of python to support user-defined operators. Can I use perl6 in python code? If so it will be great. — lovetomato, Jan 20 '19 at 00:33
I don't know of a way to call Rakudo (or the NQP grammar engine, which would be a lot faster and completely viable if the main usage is a grammar) from Python in a way similar to how one can [call Python from P6](https://github.com/niner/inline-python). But it would be nice, wouldn't it? If you have grit, know a little C, and chatted with P6 core devs you could maybe make it happen. Stefan started Inline::Perl5 in a few hours which he explains in [3 minutes...](https://www.youtube.com/watch?v=m_Y-lvQP6jI). — raiph, Jan 20 '19 at 00:50

How to cache and use the cached regexes in perl6 grammar?

1 Answers1

User defined operators

DSLs and Slangs