1

Assuming I have a BNF grammar as shown below. Now a 'List' will correspond to all terms before the '|' symbol. However, I want to read the very last number of every 'List' as an attribute of the 'List'.

<code> ::= <code> <line> 12 2 | <line> 24 4 
<line> ::= <ifte> 13 23 | <loop> 24 34 | <action> 15 3 
<ifte> ::= if <cond> {<code>} else {<code>} 12

Furthermore, this last number (List attribute) can be optional; I guess to make this easier I might have to maybe use some symbol to enclose the number for easier parsing e.g <<23>>.

The code from here compiles but it doesn't parse the grammar above:

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted.hpp>
/*#include <fmt/ranges.h>
#include <fmt/ostream.h>*/
#include <iomanip>

namespace AST {

    struct Name : std::string {
        using std::string::string;
        using std::string::operator=;

        friend std::ostream &operator<<(std::ostream &os, Name const &n) {
            return os << '<' << n.c_str() << '>';
        }
    };

    using Term = boost::variant<Name, std::string>;

    struct List {
        std::vector<Term> terms;
        int number;
    };

    using Expression = std::vector<List>;

    struct Rule {
        Name name; //rhs
        Expression rhs;
    };

    using Syntax = std::vector<Rule>;
}
BOOST_FUSION_ADAPT_STRUCT(AST::List, terms, number)
BOOST_FUSION_ADAPT_STRUCT(AST::Rule, name, rhs)

namespace Parser {

    namespace qi = boost::spirit::qi;
    template<typename Iterator>
    class BNF : public qi::grammar<Iterator, AST::Syntax()> {
    public:
        BNF() : BNF::base_type(start) {
            start       = qi::skip(blank)[rule % +qi::eol];
            _rule_name  = qi::hold[qi::char_('<') >> (qi::alpha >> *(qi::alnum | qi::char_('-'))) >> qi::char_('>')];
            _list       = +term >> qi::uint_;
            term        = _literal | _rule_name;
            _literal    = qi::hold['"' >> *(character - '"') >> '"']
                        | qi::hold["'" >> *(character - "'") >> "'"]
                        | qi::hold[+(qi::graph - qi::char_("<|>") - "::=")];
            character   = qi::alnum | qi::char_("\"'| !#$%&()*+,./:;>=<?@]\\^_`{}~[-");
            _expression = _list % '|';

            rule = _rule_name >> "::=" >> _expression;

            BOOST_SPIRIT_DEBUG_NODES((rule)(_expression)(_list)(term)(_literal)(
                character)(_rule_name))
        }

    private:
        qi::rule<Iterator> blank;
        qi::rule<Iterator, AST::Syntax()>     start;
        qi::rule<Iterator, AST::Rule(),       qi::rule<Iterator>> rule;
        qi::rule<Iterator, AST::Expression(), qi::rule<Iterator>> _expression;
        qi::rule<Iterator, AST::List(),       qi::rule<Iterator>> _list;
        qi::rule<Iterator, AST::Term()>       term;
        qi::rule<Iterator, AST::Name()>       _rule_name;
        qi::rule<Iterator, std::string()>     _literal;
        qi::rule<Iterator, char()>            character;
    };
}

int main() {
    Parser::BNF<std::string::const_iterator> const  parser;
}

How can I fix/modify the code link above to suit my needs.

sehe
  • 374,641
  • 47
  • 450
  • 633
r360
  • 49
  • 4

1 Answers1

1

I think it's unclear what input grammar you want to support.

E.g.,

  1. when list attributes can be optional, does that mean that instead of <code> <line> 12 2 this would also be a valid list without the attribute: <code> <line> 12 2? How would you avoid parsing the 12 as the attribute?
  2. your input uses names in {} - which the parser implementation you show clearly doesn't support. Do you need to support that? How?

Let's address them both

ad 2.: Fixing your input

Let's assume you really don't want magic meaning to {}, but intended them as literals in your grammar. Like "if" and "else" they needed to be literals, so:

<ifte> ::= 'if' <cond> '{' <code> '}' 'else' '{' <code> '}' 23

or

<ifte> ::= "if" <cond> "{" <code> "}" "else" "{" <code> "}" 23

That fixes your sample: Live On Compiler Explorer:

code ::= <code><line> 34 | <line> 34
line ::= <ifte> 23 | <loop> 34 | <action> 23
ifte ::= if<cond>{<code>}else{<code>} 23
Remaining: "
"

ad 1.: Optional attributes

Let's express our intent:

using ListAttribute = int;

struct List {
    std::list<Term> terms;
    ListAttribute attribute;
};

And then in the grammar add a lexeme rule (no skipper):

qi::rule<Iterator, Ast::ListAttribute()> _attribute;

Which we then implement like:

_attribute  = lexeme [ "<<" >> qi::uint_ >> ">>" ] 
            | qi::attr(0);
_list       = +_term >> _attribute;

Now it will recognize only <> as list attribute:

Live On Compiler Explorer

std::string const input =
    "<code> ::= <code> <line> | <line>\n"
    "<line> ::= <ifte> | <loop> | <action>\n"
    "<ifte> ::= 'if' <cond> '{' <code> '}' 'else' '{' <code> '}'\n"

    "<code> ::= <code> <line> <<34>> | <line> <<34>>\n"
    "<line> ::= <ifte> <<23>> | <loop> <<34>> | <action> <<23>>\n"
    "<ifte> ::= 'if' <cond> '{' <code> '}' 'else' '{' <code> '}' <<23>>\n"

    // and the disambiguated example from the question
    "<code> ::= <code> <line> '34' | <line> '12' <<2>>\n"
;

Printing

code ::= <code><line> 0 | <line> 0
line ::= <ifte> 0 | <loop> 0 | <action> 0
ifte ::= if<cond>{<code>}else{<code>} 0
code ::= <code><line> 34 | <line> 34
line ::= <ifte> 23 | <loop> 34 | <action> 23
ifte ::= if<cond>{<code>}else{<code>} 23
code ::= <code><line>34 0 | <line>12 2
Remaining: "
"

Summary/Bonus

I just realized that you don't need to disambiguate between 12 2 and 12 (missing attribute) because 12 isn't a valid input token anyways (literals/names start with one of <"'), so here goes:

Live On Compiler Explorer

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted.hpp>
#include <fmt/ranges.h>
#include <fmt/ostream.h>
#include <iomanip>
namespace qi = boost::spirit::qi;

namespace Ast {
    struct Name : std::string {
        using std::string::string;
        using std::string::operator=;

        friend std::ostream& operator<<(std::ostream& os, Name const& n) {
            return os << '<' << n.c_str() << '>';
        }
    };

    using Term = boost::variant<Name, std::string>;

    using ListAttribute = int;

    struct List {
        std::list<Term> terms;
        ListAttribute attribute;

        friend std::ostream& operator<<(std::ostream& os, List const& l) {
            for (auto& t : l.terms)
                os << t;
            return os << " " << l.attribute;
        }
    };

    using Expression = std::list<List>;

    struct Rule {
        Name name; // lhs
        Expression rhs;
    };

    using Syntax = std::list<Rule>;
}

BOOST_FUSION_ADAPT_STRUCT(Ast::List, terms, attribute)
BOOST_FUSION_ADAPT_STRUCT(Ast::Rule, name, rhs)

namespace Parser {
    template <typename Iterator>
    struct BNF: qi::grammar<Iterator, Ast::Syntax()> {
        BNF(): BNF::base_type(start) {
            using namespace qi;
            start = skip(blank) [ _rule % +eol ];

            _rule       = _rule_name >> "::=" >> _expression;
            _expression = _list % '|';
            _attribute  = uint_ | qi::attr(0);
            _list       = +_term >> _attribute;
            _term       = _literal | _rule_name ;
            _literal    = '"' >> *(_character - '"') >> '"'
                        | "'" >> *(_character - "'") >> "'";
            _character  = alnum | char_("\"'| !#$%&()*+,./:;>=<?@]\\^_`{}~[-");
            _rule_name  = '<' >> (alpha >> *(alnum | char_('-'))) >> '>';

            BOOST_SPIRIT_DEBUG_NODES(
                (_rule)(_expression)(_list)(_attribute)(_term)
                (_literal)(_character)
                (_rule_name))
        }

      private:
        qi::rule<Iterator, Ast::Syntax()>     start;
        qi::rule<Iterator, Ast::Rule(),       qi::blank_type> _rule;
        qi::rule<Iterator, Ast::Expression(), qi::blank_type> _expression;
        qi::rule<Iterator, Ast::List(),       qi::blank_type> _list;
        // lexemes
        qi::rule<Iterator, Ast::ListAttribute()> _attribute;
        qi::rule<Iterator, Ast::Term()>          _term;
        qi::rule<Iterator, Ast::Name()>          _rule_name;
        qi::rule<Iterator, std::string()>        _literal;
        qi::rule<Iterator, char()>               _character;
    };
}

int main() {
    Parser::BNF<std::string::const_iterator> const parser;

    std::string const input =
        "<code> ::= <code> <line> | <line>\n"
        "<line> ::= <ifte> | <loop> | <action>\n"
        "<ifte> ::= 'if' <cond> '{' <code> '}' 'else' '{' <code> '}'\n"

        "<code> ::= <code> <line> 34 | <line> 34\n"
        "<line> ::= <ifte> 23 | <loop> 34 | <action> 23\n"
        "<ifte> ::= 'if' <cond> '{' <code> '}' 'else' '{' <code> '}' 23\n"

        // and the disambiguated example from the question
        "<code> ::= <code> <line> '34' | <line> '12' 2\n"
    ;

    auto it = input.begin(), itEnd = input.end();

    Ast::Syntax syntax;
    if (parse(it, itEnd, parser, syntax)) {
        for (auto& rule : syntax)
            fmt::print("{} ::= {}\n", rule.name, fmt::join(rule.rhs, " | "));
    } else {
        std::cout << "Failed\n";
    }

    if (it != itEnd)
        std::cout << "Remaining: " << std::quoted(std::string(it, itEnd)) << "\n";
}

Printing

code ::= <code><line> 0 | <line> 0
line ::= <ifte> 0 | <loop> 0 | <action> 0
ifte ::= if<cond>{<code>}else{<code>} 0
code ::= <code><line> 34 | <line> 34
line ::= <ifte> 23 | <loop> 34 | <action> 23
ifte ::= if<cond>{<code>}else{<code>} 23
code ::= <code><line>34 0 | <line>12 2
Remaining: "
"
sehe
  • 374,641
  • 47
  • 450
  • 633
  • the reason why I said maybe I have to use enclosing symbols such as `<>` for the attribute is because a literal can be without the single or double quotes; that's why the literal has a third rule to cater for that. – r360 Apr 22 '21 at 18:38
  • @r360 I completely missed that. Because you said "the code from *here*" and I took it to mean the code I ended up with at the previous answer. Lesson: next time just include the code (I edited your question to do that, but I still missed that the code was different). – sehe Apr 22 '21 at 19:07
  • I'll give it another look – sehe Apr 22 '21 at 19:07
  • What's the deal with `blank`? You never initialize it. If you just don't want any skipper, why don't you just remove it? – sehe Apr 22 '21 at 19:15
  • So, I just managed to scrutinize all the edits to find that you had no other changes besides unitialized rule (`blank`), reordered and inconsistently renamed rules, probably confused `qi::hold[]` and a illogical decision to include '<' and '>' inside the rule name. Oh and removing the test code, so we can't actually see the goal or what works. Oh, and it all segfaults immediately due to the use of a copy-constructed temporary in `qi::skip`. – sehe Apr 22 '21 at 19:42
  • Ten more minutes down the road I understand that expression doesn't parse the simplest test cases anymore because `_literal` eats rule names. Oh no, that's actually guarded (I kinda like what you did with `_literal`, except for the `hold[]`s!). Still it doesn't parse, because... yep the skipped was disabled. I'mma revert that. – sehe Apr 22 '21 at 19:46
  • For the blank I want to enable multi line parsing for the rule. Here is the [code](https://godbolt.org/z/vr6P7EGK3). Except it doesn't work. – r360 Apr 22 '21 at 19:50
  • Still the ordering of literal needed to be reversed (_rule_name first) because otherwise `<>` would be parsed with the rulename rule: here it is all put together working again https://godbolt.org/z/TMrqsr1To – sehe Apr 22 '21 at 19:54
  • @r360 _"For the blank I want to enable multi line parsing for the rule"_ see https://stackoverflow.com/questions/62967856/how-to-handle-multi-line-rules-for-gor-parsing-bnf-grammar-using-boost-spirit-qi/62970956#62970956 – sehe Apr 22 '21 at 19:55