2

I've been learning C++ and boost spirit lately to help my team parsing complex structure into AST then to XML. Blessing with a lot of helps from this community (mostly from Sehe), thing been moving pretty good. With my limit knowledge of C++ & boost, I'm stuck again...:( What's new right?

I'm trying to parse the following structure:

If (Var1 == "Test" && Var2 <= 10 && Var3 == "Done")
  Verify Word 32 Objective;
  If ((Var3 == "A" || Var4 == "B") && Var5 > 0)
     Assign VarName "Value1";
     Assign Var2 10;
  Elseif (Var3 == "C")
    Assign VarName "SomeValue"
  End If;
Else
  Assign VarName "Value2"
EndIf;

Notes

  1. If statement will follow this format: If-ElseIf-Else-EndIf;
  2. or **If-Else-EndIf; or If-EndIf;
  3. Inside the If and else (or elseif) blocks, there could be for-loop, while-loop and/or simple statement like the example above.

The expected XML output the code block above is like this:

<if>
    <bioperator>
        &&
        <bioperator>
        &&
            <bioperator>
            ==
                <variant>Var1</variant>
                <literal>"Test"</literal>
            </bioperator>
            <bioperator>
            <=
                <variant>Var2/variant>
                <num>10</num>
            </bioperator>
        </bioperator>
        <bioperator>
        ==
            <variant>Var3/variant>
            <literal>"DONE"</literal>
        </bioperator>
    </bioperator>
    <then>
        <verify>
            <word>Word</word>
            <num>32</num>
            <obj>Objective</obj>
        </verify>
        <if>
            <bioperator>
            &&
                <bioperator>
                    ||
                    <bioperator>
                    ==
                        <variant>Var3</variant>
                        <literal>"A"</literal>
                    </bioperator>
                    <bioperator>
                    ==
                        <variant>Var4</variant>
                        <literal>"B"</literal>
                    </bioperator>
                </bioperator>
                <bioperator>
                >
                    <variant>Var5</variant>
                    <num>0</num>
                </bioperator>
            </bioperator>
            <then>
                <assign>
                    <variant>VarName</variant>
                    <literal>"Value1"</literal>
                </assign>
                <assign>
                    <variant>Var2</variant>
                    <num>10</num>
                </assign>
            </then>
            <elseif>
                <bioperator>
                ==
                    <variant>Var3</variant>
                    <literal>"C"</literal>
                </bioperator>
                <assign>
                    <variant>VarName</variant>
                    <literal>"Value2"</literal>
                </assign>
            </elseif>       
        </if>
    </then>
    <else>
        <assign>
            <variant>VarName</variant>
            <literal>"Value2"</literal>
        </assign>
    </else>
</if>

Output Notes:

  1. Only if block has <then>...</then> tag
  2. <bioperator> tag is nested.

For simplicity, I'm using the working example that sehe help me before and extend from it to add capability to parse the above.

Compiler Explorer

FULL CODE

#define BOOST_SPIRIT_DEBUG 1
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/phoenix/phoenix.hpp>
#include <iomanip>

namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;

namespace Ast {
    using boost::recursive_wrapper;

    template <typename> struct custom_string : std::char_traits<char> {};
    template <typename Tag>
    using String = std::basic_string<char, custom_string<Tag> >;

    using Identifier = String<struct TagId>;
    using Literal = String<struct TagLiteral>;
    using Variant = String<struct TagVariant>;
    using Word = String<struct TagWord>;
    using Obj = String<struct TagObj>;
    using BinOp = String<struct TagOp>;

    using Datatype = String<struct TagDatatype>;
    struct Base {
        Identifier id;
        Literal literal;
    };

    using Ids = std::vector<Identifier>;
    using Enum = Ids;

    using Number = double;
    using Value = boost::variant<Literal, Number, Identifier, Variant>;

    struct Simple : Base {
        boost::optional<Enum>     enumeration;
        boost::optional<Datatype> datatype;
        boost::optional<Value>    default_;
    };

    struct Complex;
    struct Container;
    ;
    using Class = boost::variant<
        Simple,
        recursive_wrapper<Complex>,
        recursive_wrapper<Container>
    >;

    using Classes = std::vector<Class>;
    struct Container : Base { Class element; };
    struct Complex : Base { Ids bases; Classes members; };

    // Expression block
    struct Verify {
        Word word;
        Number num;
        Obj obj;
    };
    struct Assign {
        Variant var;
        Value value;
    };

    struct Bioperator;
    struct Conditional;

    using Expression = boost::variant<
        Value,
        Verify,
        Assign,
        recursive_wrapper<Bioperator>,
        recursive_wrapper<Conditional>
    >;

    struct Bioperator {
        Variant var;
        BinOp op;
        Value value;
    };

    struct Conditional {
        Expression condition, true_block;
        boost::optional<Expression> false_block;
    };

    using Code = boost::variant<Conditional, Verify, Assign>;

    using Task = std::vector<boost::variant<Class, Code>>;
} // namespace Ast

// Classes
BOOST_FUSION_ADAPT_STRUCT(Ast::Simple, id, literal, enumeration, datatype, default_)
BOOST_FUSION_ADAPT_STRUCT(Ast::Complex, id, literal, bases, members)
BOOST_FUSION_ADAPT_STRUCT(Ast::Container, id, literal, element)

// Expressions
BOOST_FUSION_ADAPT_STRUCT(Ast::Verify, word, num, obj);
BOOST_FUSION_ADAPT_STRUCT(Ast::Assign, var, value);
BOOST_FUSION_ADAPT_STRUCT(Ast::Bioperator, var, op, value);
BOOST_FUSION_ADAPT_STRUCT(Ast::Conditional, condition, true_block, false_block);

namespace Parser {
    template <typename It> struct Task : qi::grammar<It, Ast::Task()> {
        Task() : Task::base_type(start) {
            using namespace qi;

            start = skip(space)[task_];

            // lexemes:
            id_ = raw[alpha >> *(alnum | '_' | ':')];
            variant_ = id_;
            word_ = variant_;
            obj_ = word_;
            literal_ = '"' > *('\\' >> char_ | ~char_('"')) > '"';

            auto optlit = copy(literal_ | attr(std::string(" ")));

            task_ = *task_item > eoi;
            task_item = class_ | code_;
            subclass_ = simple_class_ | complex_ | container_;
            class_ = lit("Class") > subclass_ > ';';
            simple_class_ = lit("Simple") >> id_ >> optlit >> -enum_ >> -datatype_ >> -default_;
            inherit_ = lit("Inherit") >> id_;
            complex_ = lit("Complex") >> id_ >> optlit >> '(' >> *inherit_ >> *subclass_ >> ')';
            container_ = lit("Container") >> id_ >> optlit >> '(' >> subclass_ > ')';
            enum_ = lit("enumeration") >> '(' >> -(id_ % ',') > ')';
            datatype_ = lit("datatype") >> id_;
            value_ = literal_ | number_ | id_;
            number_ = double_;
            default_ = lit("Default") >> value_;

            // Expression 
            code_ = conditional_ | assign_ | verify_;   // more to come
            expr_ = simple_expr | assign_ | verify_;// | *(boolop_ >> bioperator_);
            simple_expr = value_ | bioperator_ | verify_ | assign_ | conditional_;
            bioperator_ = '(' >> variant_ >> binop_ >> value_ >> ')';
            assign_ = no_case["assign"] >> variant_ >> value_ > ';';
            verify_ = no_case["verify"] >> word_ >> number_ >> obj_ > ';';
            conditional_
                = no_case["if"] >> '(' >> expr_ >> ')' >> expr_
                //>> -((lit("else") | lit("elseif")) >> expr_)  // else & elseif
                >> no_case["endif"] > ';'
                ;
            //elsepart_ = no_case[lit("else")] >> expr_;
            //elseifpart_ = no_case["elseif"] >> conditional_;
            binop_ = string("==") | string("!=") | string(">") | string(">=") | string("<") | string("<=");
            boolop_ = string("||") | string("&&");

            BOOST_SPIRIT_DEBUG_NODES(
                (task_)(task_item)(class_)(subclass_)(simple_class_)(complex_)(container_)(enum_)(datatype_)(default_)(inherit_)
                (id_)(literal_)(variant_)(word_)(value_)(number_)(obj_)
                (expr_)(verify_)(assign_)(conditional_)(assign_)(binop_)(boolop_)
            )
        }

    private:
        qi::rule<It, Ast::Task()> start;

        using Skipper = qi::space_type;
        qi::rule<It, Ast::Task(), Skipper>        task_, task_item;
        qi::rule<It, Ast::Class(), Skipper>       class_, subclass_;
        qi::rule<It, Ast::Simple(), Skipper>      simple_class_;
        qi::rule<It, Ast::Complex(), Skipper>     complex_;
        qi::rule<It, Ast::Container(), Skipper>   container_;
        qi::rule<It, Ast::Enum(), Skipper>        enum_;
        qi::rule<It, Ast::Datatype(), Skipper>    datatype_;
        qi::rule<It, Ast::Value(), Skipper>       default_;
        qi::rule<It, Ast::Identifier(), Skipper>  inherit_;
        qi::rule<It, Ast::Verify(), Skipper>      verify_;
        qi::rule<It, Ast::Assign(), Skipper>      assign_;
        qi::rule<It, Ast::Code(), Skipper>        code_;
        qi::rule<It, Ast::Expression(), Skipper>  expr_, simple_expr;
        qi::rule<It, Ast::Conditional(), Skipper> conditional_, elsepart_, elseifpart_;
        qi::rule<It, Ast::Bioperator(), Skipper>  bioperator_;

        // lexemes:
        qi::rule<It, Ast::Identifier()> id_;
        qi::rule<It, Ast::Literal()>    literal_;
        qi::rule<It, Ast::Variant()>    variant_;
        qi::rule<It, Ast::Word()>       word_;
        qi::rule<It, Ast::Obj()>       obj_;
        qi::rule<It, Ast::Value()>      value_;
        qi::rule<It, Ast::Number()>     number_;
        qi::rule<It, Ast::BinOp()>      binop_, boolop_;
    };
}

#include <pugixml.hpp>
namespace Generate {
    using namespace Ast;

    struct XML {
        using Node = pugi::xml_node;

        // callable for variant visiting:
        template <typename T> void operator()(Node parent, T const& node) const { apply(parent, node); }

    private:
        template <typename... Ts>
        void apply(Node parent, boost::variant<Ts...> const& v) const {
            using std::placeholders::_1;
            boost::apply_visitor(std::bind(*this, parent, _1), v);
        }

        void apply(Node parent, Number const& num) const {
            named_child(parent, "num").text().set(num);
        }

        void apply(Node parent, Identifier const& id) const {
            named_child(parent, "identifier").text().set(id.c_str());
        }
        void apply(Node parent, Obj const& o) const {
            named_child(parent, "obj").text().set(o.c_str());
        }
        void apply(Node parent, Word const& w) const {
            named_child(parent, "word").text().set(w.c_str());
        }
        void apply(Node parent, Variant const& v) const {
            named_child(parent, "variant").text().set(v.c_str());
        }

        void apply(Node parent, Literal const& literal) const {
            named_child(parent, "literal").text().set(literal.c_str());
        }

        void apply(Node parent, Datatype const& datatype) const {
            named_child(parent, "datatype").text().set(datatype.c_str());
        }

        template <typename T> void apply(Node parent, boost::optional<T> const& opt) const {
            if (opt)
                apply(parent, *opt);
        }

        void apply(Node parent, Simple const& s) const {
            auto simple = named_child(parent, "simple");
            apply(simple, s.id);
            apply(simple, s.literal);
            apply(simple, s.enumeration);
            apply(simple, s.datatype);
            if (s.default_.has_value()) {
                apply(named_child(simple, "default"), *s.default_);
            }
        }

        void apply(Node parent, Enum const& e) const {
            auto enum_ = named_child(parent, "enumeration");
            for (auto& v : e)
                named_child(enum_, "word").text().set(v.c_str());
        }

        void apply(Node parent, Complex const& c) const {
            auto complex_ = named_child(parent, "complex");
            apply(complex_, c.id);
            for (auto& base : c.bases)
                apply(named_child(complex_, "inherit"), base);
            apply(complex_, c.literal);
            for (auto& m : c.members)
                apply(complex_, m);
        }

        void apply(Node parent, Container const& c) const {
            auto cont = named_child(parent, "container");
            apply(cont, c.id);
            apply(cont, c.literal);
            apply(cont, c.element);
        }

        void apply(Node parent, Assign const& a) const {
            auto asn_ = named_child(parent, "assign");
            apply(asn_, a.var);
            apply(asn_, a.value);
        }

        void apply(Node parent, Verify const& v) const {
            auto v_ = named_child(parent, "verify");
            apply(v_, v.word);
            apply(v_, v.num);
            apply(v_, v.obj);
        }

        //void apply(Node parent, Bioperator const& bo) const {
        //  auto botag = named_child(parent, "bioperator").text().set(bo.op.c_str());
        //  //apply(botag, bo.var);
        //  //apply(botag, bo.value);
        //}

        void apply(Node parent, Conditional const& c) const {
            auto task = named_child(parent, "not-implement-yet");
        }

        void apply(Node parent, Task const& t) const {
            auto task = named_child(parent, "task");
            for (auto& item : t) {
                std::string node = "class";
                if (item.type() == typeid(Code)) {
                    apply(task, item);
                }
                else if (item.type() == typeid(Class)) {
                    apply(task.append_child("class"), item);
                }
            }
        }

    private:
        Node named_child(Node parent, std::string const& name) const {
            auto child = parent.append_child();
            child.set_name(name.c_str());
            return child;
        }
    };
} // namespace Generate


int main() {
    using It = std::string::const_iterator;
    static const Parser::Task<It> p;
    static const Generate::XML to_xml;

    for (std::string const input : {
        R"(
        If (Var1 == "Test" && Var2 <= 10 && Var3 == "Done")
                    Verify Word 32 Objective;
                    If ((Var3 == "A" || Var4 == "B") && Var5 > 0)
                         Assign VarName "Value1";
                         Assign Var2 10;
                    Elseif (Var3 == "C")
                        Assign VarName "SomeValue"
                    End If;
                Else
                    Assign VarName "Value2"
                EndIf;
    )"
    }) {
        try {
            Ast::Task t;

            if (qi::parse(begin(input), end(input), p, t)) {
                pugi::xml_document doc;
                to_xml(doc.root(), t);
                doc.print(std::cout, "  ", pugi::format_default);
                std::cout << std::endl;
            }
            else {
                std::cout << " -> INVALID" << std::endl;
            }
        }
        catch (qi::expectation_failure<It> const& ef) {
            auto f = begin(input);
            auto p = ef.first - input.begin();
            auto bol = input.find_last_of("\r\n", p) + 1;
            auto line = std::count(f, f + bol, '\n') + 1;
            auto eol = input.find_first_of("\r\n", p);

            std::cerr << " -> EXPECTED " << ef.what_ << " in line:" << line << "\n"
                << input.substr(bol, eol - bol) << "\n"
                << std::setw(p - bol) << ""
                << "^--- here" << std::endl;
        }
    }
}

I can't seem to compose the grammar to handle the recursive (nested) if-else. The above is what I have. The grammar does not seem correct as it won't parse the example text.

Any help would be greatly appreciated.

Thanks

Dylan
  • 121
  • 7

1 Answers1

1

I can't seem to compose the grammar to handle the recursive (nested) if-else.

Let's have look (you can get undressed behind the curtain, the doctor will be with you shortly).

I took the code and reviewed it into the baseline version that indeed reproduces the error: https://compiler-explorer.com/z/4dnhM1a3a

However I fear your diagnosis is some ways off. I think you're underestimating the complexity of the endeavour. Specifically, the first compilation error doesn't even relate to the conditional statement, rather it relates to the boolean expression:

    expr_  = simple_expr | *(boolop_ >> bioperator_);

The structure of the rule does not match the AST:

struct Bioperator {
    Variable var;
    BinOp    op;
    Value    value;
};

There are good reasons to do that, but you wil have to come up with the necessary glue, say "instructions" to tell Spirit how to compose your AST based on the synthesized attributes (which contain the repeated *() construct). I'd point at various expression parsers that I've worked on in the past on StackOverflow, but it looks like you might have used an example or two, just missing the glue.

Jumping ahead, imagining those basic problems "solved", I can see more dragons on the horizon, e.g.:

binop_  = string("==") | string("!=") | string(">") | string(">=") | string("<") | string("<=");

While I admire a propensity to keep things simple when possible, this misses more hidden complexity, e.g. that string("<") will by definition prevent string("<=") from ever matching. You could solve it by reordering to PEG-friendly order, but the safer (and more efficient) way is to use a symbol lookup.

qi::symbols<char> rel_ops_, bool_ops;
rel_ops_ += "==", "!=", ">", ">=", "<", "<=";
bool_ops += "||", "&&";

I would never in a million years model the AST for operators as a string, but I'll leave that as an exorcism for the reader.

binop_  = raw[rel_ops_];
boolop_ = raw[bool_ops];

You probably sense that I think your naming is not accurate. After all, those boolean operators ARE binary operators, so maybe don't confuse yourself with arbitrary naming. This goes back to understanding your application domain accurately and sticking to it

I started trying to fix the missing expression bits but I figured out that I lack the domain information. E.g. the expr_ production suggests that the LHS operand is a simple_expr - however, the the Ast present it as Variant (which I relabeled Variable to avoid confusion with the C++ variants used, and because my intuition tells me you actually need it to mean something like "an lvalue expression", aka something that can be assigned, i.e. a variable). I also noted that you commented the expr_ branch in statement_, so I'll just skip it for now. Just for reference, here is a sketch of the direction I see this going (once you fix the Ast/production mismatches):

expr_       = simple_expr[_val = _1] >>
    *(boolop_ >> bioperator_)[_val = px::construct<Ast::Bioperator>(_val, _1, _2)];

Similarly, code like this makes me confused:

void apply(Node parent, Task const& t) const {
    auto task = named_child(parent, "task");
    for (auto& item : t) {
        std::string node = "class";
        if (item.type() == typeid(Statement)) {
            apply(task, item);
        }
        else if (item.type() == typeid(ClassDef)) {
            apply(task.append_child("class"), item);
        }
    }
}

We structured the entire XML transformation around variant visitation, and here you hardcode a type-switch without actually differentiating on the item type after all. There's a spurious line std::string node = "class" which does nothing.

Conditionals

Plugging that gap for now, and putting in your desired input, brings us to the next hurdle: Conditional.

The AST looks reasonable at first glance:

struct Conditional {
    Expression condition, true_block;
    boost::optional<Expression> false_block;
};

The weirdest thing is that all blocks are expressions, whereas your sample clearly suggests statements for the blocks, and expression for the condition.

Going to the productions, they are somehow all declared with Conditional as the corresponding attribute ¯\(ツ)/¯:

qi::rule<It, Ast::Conditional(), Skipper> conditional_,  elsepart_,   elseifpart_;

That makes little sense. So, from the example I expect the true/false branches to be at least Statement - or even Task?

Let's bend the AST to fit the sample input domain:

struct Conditional;

using Statement = boost::make_recursive_variant< //
    Verify,                                      //
    Assign,                                      //
    recursive_wrapper<Conditional>,              //
    std::vector<boost::recursive_variant_>       // a Block of Statement
    >::type;                                     //

struct Conditional {
    Expression                 condition;
    Statement                  true_block;
    boost::optional<Statement> false_block;
};

using Block = std::vector<Statement>; 

using Task = std::vector<boost::variant<ClassDef, Statement>>;

Next, the parser expression suggests that all branches are optional. Either that or the true-branch is completely missing, which also seems a problem. My suggestion.

To get elseif working I wouldn't do anything, except the same trick that separated "class" >> type from just type:

conditional_ = no_case["if"] >> condcore_ >> no_case["endif"] > ';';
condcore_    = expr_                                 //
    >> statement_                                    // true block
    >> -(no_case["elseif"] >> condcore_ | elsepart_) // false block
    ;                       //
elsepart_ = no_case["else"] >> statement_;

Note how Condition is already part of Statement.

qi::rule<It, Ast::Conditional(), Skipper> conditional_, condcore_;
qi::rule<It, Ast::Statement(), Skipper>   elsepart_;

Demonstruction

This here be the result of lots of fiddling (you will be very familiar with that process. Only, for you it is by choice, and for me it is by necessity, for lack of information :))

I fiddled with the input as well, as your expression grammar is actually the more hobbled part for now. I feel you've been neglecting this core part from day 1 (actually, I was the one to notice the new requirement here:

The "arbitrary" appearance of a num XML element tells me that your language has a type system with ditto literals. And that the parser expression

default_ = -(lit("Default") >> alnum)

was likely woefully underspecified. Do yourself a favor and don't settle for sloppy a solution.

It looks like you keep accidentally expanding the scope with each question. Pretty soon you're writing a compiler by accident).

Live On Compiler Explorer

//#define BOOST_SPIRIT_DEBUG 1
#include <boost/fusion/adapted.hpp>
#include <boost/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
namespace qi = boost::spirit::qi;
namespace px = boost::phoenix;

namespace Ast {
    using boost::recursive_wrapper;

    template <typename> struct custom_string : std::char_traits<char> {};
    template <typename Tag>
    using String = std::basic_string<char, custom_string<Tag> >;

    using Identifier = String<struct TagId>;
    using Literal    = String<struct TagLiteral>;
    using Variable   = String<struct TagVariant>;
    using Word       = String<struct TagWord>;
    using Obj        = String<struct TagObj>;
    using BinOp      = String<struct TagOp>;

    using Datatype = String<struct TagDatatype>;
    struct Base {
        Identifier id;
        Literal literal;
    };

    using Ids = std::vector<Identifier>;
    using Enum = Ids;

    using Number = double;
    using Value = boost::variant<Literal, Number, Identifier, Variable>;

    struct Simple : Base {
        boost::optional<Enum>     enumeration;
        boost::optional<Datatype> datatype;
        boost::optional<Value>    default_;
    };

    struct Complex;
    struct Container;

    using ClassDef = boost::variant<
        Simple,
        recursive_wrapper<Complex>,
        recursive_wrapper<Container>
    >;

    using ClassDefs = std::vector<ClassDef>;
    struct Container : Base {
        ClassDef element;
    };
    struct Complex : Base {
        Ids       bases;
        ClassDefs members;
    };

    // Expression block
    struct Verify {
        Word   word;
        Number num;
        Obj    obj;
    };
    struct Assign {
        Variable var;
        Value    value;
    };

    struct Bioperator;
    struct Conditional;

    using Expression = boost::variant<Value, Verify, Assign, recursive_wrapper<Bioperator>>;

    struct Bioperator {
        Bioperator(Expression l = {}, BinOp o = {}, Expression r = {})
            : var(std::move(l))
            , op(std::move(o))
            , value(std::move(r)) {}
        Expression var;
        BinOp      op;
        Expression value;
    };

    using Statement = boost::make_recursive_variant< //
        std::vector<boost::recursive_variant_>,      // a Block of Statement
        Verify,                                      //
        Assign,                                      //
        recursive_wrapper<Conditional>               //
        >::type;                                     //

    using Block = std::vector<Statement>;

    struct Conditional {
        Expression             condition;
        Block                  true_block;
        boost::optional<Block> false_block;
    };

    using Task = std::vector<boost::variant<ClassDef, Statement>>;
} // namespace Ast

// Classes
BOOST_FUSION_ADAPT_STRUCT(Ast::Simple, id, literal, enumeration, datatype, default_)
BOOST_FUSION_ADAPT_STRUCT(Ast::Complex, id, literal, bases, members)
BOOST_FUSION_ADAPT_STRUCT(Ast::Container, id, literal, element)

// Expressions
BOOST_FUSION_ADAPT_STRUCT(Ast::Verify, word, num, obj);
BOOST_FUSION_ADAPT_STRUCT(Ast::Assign, var, value);
BOOST_FUSION_ADAPT_STRUCT(Ast::Bioperator, var, op, value);
BOOST_FUSION_ADAPT_STRUCT(Ast::Conditional, condition, true_block, false_block);

namespace Parser {
    template <typename It> struct Task : qi::grammar<It, Ast::Task()> {
        Task() : Task::base_type(start) {
            using namespace qi;

            start = skip(space)[task_];

            // lexemes:
            id_       = raw[alpha >> *(alnum | '_' | ':')];
            variable_ = id_;
            word_     = variable_;
            obj_      = word_;
            literal_  = '"' > *('\\' >> char_ | ~char_('"')) > '"';

            auto optlit = copy(literal_ | attr(std::string(" ")));

            task_         = *task_item > eoi;
            task_item     = classdef_ | statement_;
            subclass_     = simple_class_ | complex_ | container_;
            classdef_     = lit("Class") > subclass_ > ';';
            simple_class_ = lit("Simple") >> id_ >> optlit >> -enum_ >> -datatype_ >> -default_;
            inherit_      = lit("Inherit") >> id_;
            complex_      = lit("Complex") >> id_ >> optlit >> '(' >> *inherit_ >> *subclass_ >> ')';
            container_    = lit("Container") >> id_ >> optlit >> '(' >> subclass_ > ')';
            enum_         = lit("enumeration") >> '(' >> -(id_ % ',') > ')';
            datatype_     = lit("datatype") >> id_;
            value_        = literal_ | number_ | id_;
            number_       = double_;
            default_      = lit("Default") >> value_;

            // Expression
            statement_  = (assign_ | verify_ | conditional_) > ';'; // expr_
            block_      = *statement_;

            expr_ = simple_expr[_val = _1] //
                >> *(boolop_ >> expr_)[_val = px::construct<Ast::Bioperator>(_val, _1, _2)];

            simple_expr = value_ | bioperator_;
            bioperator_ = '(' >> variable_ >> binop_ >> value_ >> ')';
            assign_     = no_case["assign"] >> variable_ >> value_;
            verify_     = no_case["verify"] >> word_ >> number_ >> obj_;

            conditional_ = no_case["if"] >> condcore_ >> no_case["endif"];
            condcore_    = expr_                                 //
                >> block_                                        // true block
                >> -(no_case["elseif"] >> condcore_ | elsepart_) // false block
                ;                                                //
            elsepart_ = no_case["else"] >> block_;

            rel_ops_ += "==", "!=", ">", ">=", "<", "<=";
            bool_ops += "||", "&&";

            binop_  = raw[rel_ops_];
            boolop_ = raw[bool_ops];

            BOOST_SPIRIT_DEBUG_NODES(                                                         //
                (task_)(task_item)(classdef_)(subclass_)(simple_class_)(complex_)(container_) //
                (enum_)(datatype_)(default_)(inherit_)                                        //
                (id_)(literal_)(variable_)(word_)(value_)(number_)(obj_)                      //
                (expr_)(verify_)(assign_)(conditional_)(condcore_)(assign_)(binop_)(boolop_)  //
                (statement_)(block_)                                                          //
            )
        }

    private:
        qi::rule<It, Ast::Task()> start;
        qi::symbols<char> rel_ops_, bool_ops;

        using Skipper = qi::space_type;
        qi::rule<It, Ast::Task(),        Skipper> task_, task_item;
        qi::rule<It, Ast::ClassDef(),    Skipper> classdef_, subclass_;
        qi::rule<It, Ast::Simple(),      Skipper> simple_class_;
        qi::rule<It, Ast::Complex(),     Skipper> complex_;
        qi::rule<It, Ast::Container(),   Skipper> container_;
        qi::rule<It, Ast::Enum(),        Skipper> enum_;
        qi::rule<It, Ast::Datatype(),    Skipper> datatype_;
        qi::rule<It, Ast::Value(),       Skipper> default_;
        qi::rule<It, Ast::Identifier(),  Skipper> inherit_;
        qi::rule<It, Ast::Verify(),      Skipper> verify_;
        qi::rule<It, Ast::Assign(),      Skipper> assign_;
        qi::rule<It, Ast::Statement(),   Skipper> statement_;
        qi::rule<It, Ast::Expression(),  Skipper> expr_, simple_expr;
        qi::rule<It, Ast::Conditional(), Skipper> conditional_, condcore_;
        qi::rule<It, Ast::Statement(),   Skipper> elsepart_;
        qi::rule<It, Ast::Bioperator(),  Skipper> bioperator_;
        qi::rule<It, Ast::Block(),       Skipper> block_;

        // lexemes:
        qi::rule<It, Ast::Identifier()> id_;
        qi::rule<It, Ast::Literal()>    literal_;
        qi::rule<It, Ast::Variable()>   variable_;
        qi::rule<It, Ast::Word()>       word_;
        qi::rule<It, Ast::Obj()>        obj_;
        qi::rule<It, Ast::Value()>      value_;
        qi::rule<It, Ast::Number()>     number_;
        qi::rule<It, Ast::BinOp()>      binop_, boolop_;
    };
}

#include <pugixml.hpp>
namespace Generate {
    using namespace Ast;

    struct XML {
        using Node = pugi::xml_node;

        // callable for variant visiting:
        template <typename T> void operator()(Node parent, T const& node) const { apply(parent, node); }

    private:
        template <typename... Ts>
        void apply(Node parent, boost::variant<Ts...> const& v) const {
            using std::placeholders::_1;
            boost::apply_visitor(std::bind(*this, parent, _1), v);
        }

        void apply(Node parent, Number const& num) const {
            named_child(parent, "num").text().set(num);
        }

        void apply(Node parent, Identifier const& id) const {
            named_child(parent, "identifier").text().set(id.c_str());
        }
        void apply(Node parent, Obj const& o) const {
            named_child(parent, "obj").text().set(o.c_str());
        }
        void apply(Node parent, Word const& w) const {
            named_child(parent, "word").text().set(w.c_str());
        }
        void apply(Node parent, Variable const& v) const {
            named_child(parent, "variant").text().set(v.c_str());
        }

        void apply(Node parent, Literal const& literal) const {
            named_child(parent, "literal").text().set(literal.c_str());
        }

        void apply(Node parent, Datatype const& datatype) const {
            named_child(parent, "datatype").text().set(datatype.c_str());
        }

        template <typename T> void apply(Node parent, boost::optional<T> const& opt) const {
            if (opt)
                apply(parent, *opt);
        }

        void apply(Node parent, Simple const& s) const {
            auto simple = named_child(parent, "simple");
            apply(simple, s.id);
            apply(simple, s.literal);
            apply(simple, s.enumeration);
            apply(simple, s.datatype);
            if (s.default_.has_value()) {
                apply(named_child(simple, "default"), *s.default_);
            }
        }

        void apply(Node parent, Enum const& e) const {
            auto enum_ = named_child(parent, "enumeration");
            for (auto& v : e)
                named_child(enum_, "word").text().set(v.c_str());
        }

        void apply(Node parent, Complex const& c) const {
            auto complex_ = named_child(parent, "complex");
            apply(complex_, c.id);
            for (auto& base : c.bases)
                apply(named_child(complex_, "inherit"), base);
            apply(complex_, c.literal);
            for (auto& m : c.members)
                apply(complex_, m);
        }

        void apply(Node parent, Container const& c) const {
            auto cont = named_child(parent, "container");
            apply(cont, c.id);
            apply(cont, c.literal);
            apply(cont, c.element);
        }

        void apply(Node parent, Assign const& a) const {
            auto asn_ = named_child(parent, "assign");
            apply(asn_, a.var);
            apply(asn_, a.value);
        }

        void apply(Node parent, Verify const& v) const {
            auto v_ = named_child(parent, "verify");
            apply(v_, v.word);
            apply(v_, v.num);
            apply(v_, v.obj);
        }

        void apply(Node parent, Bioperator const& bo) const {
          [[maybe_unused]] auto botag = named_child(parent, "bioperator").text().set(bo.op.c_str());
          //apply(botag, bo.var);
          //apply(botag, bo.value);
        }

        void apply(Node parent, Conditional const& c) const {
            auto cond = named_child(parent, "conditional");
            apply(named_child(cond, "expression"), c.condition);
            apply(named_child(cond, "true_branch"), c.true_block);
            if (c.false_block)
                apply(named_child(cond, "false_branch"), *c.false_block);
        }

        void apply(Node parent, Block const& b) const {
            auto block = named_child(parent, "block");
            for (auto& s : b)
                apply(block, s);
        }

        void apply(Node parent, Task const& t) const {
            auto task = named_child(parent, "task");
            for (auto& item : t)
                apply(named_child(task, "item"), item);
        }

    private:
        Node named_child(Node parent, std::string const& name) const {
            auto child = parent.append_child();
            child.set_name(name.c_str());
            return child;
        }
    };
} // namespace Generate

static const std::string cases[] = {
    R"(
        If (Var1 == "Test") && (Var2 <= 10) && (Var3 == "Done")
          Verify Word 32 Objective;
          Assign VarName "Value2";
        EndIf;)",
    R"(
        If (Var1 == "Test") && (Var2 <= 10) && (Var3 == "Done")
          Verify Word 32 Objective;
        Else
          Assign VarName "Value2";
        EndIf;)",
    R"(
        Verify Word 32 Objective;
        If (Var3 == "A")
           Assign VarName "Value1";
           Assign Var2 10;
        Elseif (Var3 == "C")
          Assign VarName "SomeValue";
        EndIf;)",
    R"(
        If (Var1 == "Test") && (Var2 <= 10) && (Var3 == "Done")
          Verify Word 32 Objective;
          If (Var3 == "A") || (Var4 == "B") && (Var5 > 0)
             Assign VarName "Value1";
             Assign Var2 10;
          Elseif (Var3 == "C")
            Assign VarName "SomeValue";
          EndIf;
        Else
          Assign VarName "Value2";
        EndIf;)",
};

int main() {
    using It = std::string::const_iterator;
    static const Parser::Task<It> p;
    static const Generate::XML to_xml;

    for (int i = 0; std::string const& input : cases) {
        try {
            Ast::Task t;

            std::cout << "*** Sample #" << ++i << std::endl;
            if (qi::parse(begin(input), end(input), p, t)) {
                pugi::xml_document doc;
                to_xml(doc.root(), t);
                doc.print(std::cout, "  ", pugi::format_default);
                std::cout << std::endl;
            } else {
                std::cout << " -> INVALID" << std::endl;
            }
        } catch (qi::expectation_failure<It> const& ef) {
            auto f = begin(input);
            auto p = ef.first - input.begin();
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wsign-conversion"
            auto bol  = input.find_last_of("\r\n", p) + 1;
            auto line = std::count(f, f + bol, '\n') + 1;
            auto eol  = input.find_first_of("\r\n", p);

            std::cerr << " -> EXPECTED " << ef.what_ << " in line:" << line << "\n"
                      << input.substr(bol, eol - bol) << "\n"
                      << std::setw(p - bol) << ""
                      << "^--- here" << std::endl;
#pragma GCC diagnostic pop
        }
    }
}

I didn't bother to complete the XML generation step, so it's limited:

*** Sample #1
<task>
  <item>
    <conditional>
      <expression>
        <bioperator>&amp;&amp;</bioperator>
      </expression>
      <true_branch>
        <block>
          <verify>
            <word>Word</word>
            <num>32</num>
            <obj>Objective</obj>
          </verify>
          <assign>
            <variant>VarName</variant>
            <literal>Value2</literal>
          </assign>
        </block>
      </true_branch>
    </conditional>
  </item>
</task>

*** Sample #2
<task>
  <item>
    <conditional>
      <expression>
        <bioperator>&amp;&amp;</bioperator>
      </expression>
      <true_branch>
        <block>
          <verify>
            <word>Word</word>
            <num>32</num>
            <obj>Objective</obj>
          </verify>
        </block>
      </true_branch>
      <false_branch>
        <block>
          <block>
            <assign>
              <variant>VarName</variant>
              <literal>Value2</literal>
            </assign>
          </block>
        </block>
      </false_branch>
    </conditional>
  </item>
</task>

*** Sample #3
<task>
  <item>
    <verify>
      <word>Word</word>
      <num>32</num>
      <obj>Objective</obj>
    </verify>
  </item>
  <item>
    <conditional>
      <expression>
        <bioperator>==</bioperator>
      </expression>
      <true_branch>
        <block>
          <assign>
            <variant>VarName</variant>
            <literal>Value1</literal>
          </assign>
          <assign>
            <variant>Var2</variant>
            <num>10</num>
          </assign>
        </block>
      </true_branch>
      <false_branch>
        <block>
          <conditional>
            <expression>
              <bioperator>==</bioperator>
            </expression>
            <true_branch>
              <block>
                <assign>
                  <variant>VarName</variant>
                  <literal>SomeValue</literal>
                </assign>
              </block>
            </true_branch>
          </conditional>
        </block>
      </false_branch>
    </conditional>
  </item>
</task>

*** Sample #4
<task>
  <item>
    <conditional>
      <expression>
        <bioperator>&amp;&amp;</bioperator>
      </expression>
      <true_branch>
        <block>
          <verify>
            <word>Word</word>
            <num>32</num>
            <obj>Objective</obj>
          </verify>
          <conditional>
            <expression>
              <bioperator>||</bioperator>
            </expression>
            <true_branch>
              <block>
                <assign>
                  <variant>VarName</variant>
                  <literal>Value1</literal>
                </assign>
                <assign>
                  <variant>Var2</variant>
                  <num>10</num>
                </assign>
              </block>
            </true_branch>
            <false_branch>
              <block>
                <conditional>
                  <expression>
                    <bioperator>==</bioperator>
                  </expression>
                  <true_branch>
                    <block>
                      <assign>
                        <variant>VarName</variant>
                        <literal>SomeValue</literal>
                      </assign>
                    </block>
                  </true_branch>
                </conditional>
              </block>
            </false_branch>
          </conditional>
        </block>
      </true_branch>
      <false_branch>
        <block>
          <block>
            <assign>
              <variant>VarName</variant>
              <literal>Value2</literal>
            </assign>
          </block>
        </block>
      </false_branch>
    </conditional>
  </item>
</task>

Closing Notes

I glossed over a ton of things. Note, e.g. that you were gearing up to put Conditional inside Expression. All your examples imply that you need it to act like nesting statements. You might need a conditional expression (popularly known ternary operator) as well, like ?: in C or CASE WHEN in SQL or a if cond else b in Python. I'll leave it because none of your examples use it, and also your expression syntax needs a lot of work generally.

It all keeps coming down to this: Combat your complexity by being accurate about your domain semantics. As always, you can only solve the problem that you understand, and as long as (small) confusions creep in, they compound to insurmountable accidental complexity.

I think sometimes the professional thing to do is to stop the wild goose chase, ask for complete specs and wait for them before implementing anything.

I can almost guarantee a complete specification exists, as usually complicated grammars don't happen by accident (as you are actually finding out the hard way) and the apparent need for automated tooling reinforces the idea that this infrastructure is central to some part of a software development process. Ask for it, don't just keep working in the blind.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • This will be the last episode I spent on this project, for exactly the reasons I give in the "Closing Notes". You can ask me again, nicely, when the specifications arrive, but I would probably only have time if there's a paid consultancy position :) – sehe Jun 21 '23 at 23:26
  • Thanks for taking time explaining and provide working example with this. Also your advice is well taken as I move on with this project. I like the **You can ask me again, nicely**. So here it comes...:) Is it possible to break the grammar into multiple classes like grouping all related rules into one class (separate file) for future expansion. I'm thinking all AST will be in one file. Grammar will go like **expression class** <- **class_block class** <- **code_block class** <-- **main class** (the main contains all rules inherit from other classes) <-- **parser main function** – Dylan Jun 22 '23 at 15:05
  • @Dylan Yes, that's actually recommended, and also how the compiler examples in the Boost Spirit examples do it. I admit I never do, because I frankly feel that Spirit shines more in smaller, rapid-turnaround parsers. Otherwise I'd seriously consider handrolling or something like flex/bison and ANTLR – sehe Jun 22 '23 at 15:15
  • I just re-read your comment, and I would *not* inherit (it conflicts with the parser interface). Instead just aggregate the grammars exactly like a grammar aggregates rules now. – sehe Jun 22 '23 at 15:17
  • Thanks for the quick answer...:) – Dylan Jun 22 '23 at 15:29
  • As I went through the code above, you mentioned that _the **expression** syntax need a lot more works_ which totally true because I notice we are missing the part to capture all the expressions of the **If and ElseIf Statement**. For example `If (Var1 == "Test") && (Var2 <= 10) && (Var3 == "Done")` is captured in AST as a single `&&` bioperator. How would I modify the Condition struct to make it stores multiple expression? Maybe change to `std::vector conditions` instead of `Expression condition`? – Dylan Jun 22 '23 at 16:58
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/254193/discussion-between-dylan-and-sehe). – Dylan Jun 22 '23 at 17:06
  • For `If-ElseIf-Else` rule, I want to capture the `ElseIf` statement on it own. I tried to modify the rule but it keep throwing error. It just for the sake of when generating XML, the `if`, `elseif` and `else` generate different block of xml. The current recursive rule here will treat all elseif as `else` follow by `if` block. – Dylan Jul 14 '23 at 17:26
  • It has nothing to do with the parsing. Not even the semantics (they're correct already). You just want to generate different/more specific XML. As always: start with understanding your domain/requirements and then implement _those_. (Concretely: don't modify the parser if you want to have different output. Don't modify the AST unless your AST needs to be different) – sehe Jul 14 '23 at 19:56
  • maybe I should rephrase my question. With the current grammar, is there a way to differentiate the Else and ElseIf from AST? in AST, elseif and else is treated as false_statement. I try to see if I can differentiate when generate XML but can't find a way to tell if this is the ELSEIF or ELSE. – Dylan Jul 14 '23 at 20:12
  • "elseif and else is treated as false_statement" - well, obviously. They **are** both else statements. It's just that the **contents** of an else-if branch _happens to be_ another conditional statement. So, pardon the potentially confusing wording, you're one if-statement away of distinguishing between the two. Note, though, that depending on how accurately you wrap the AST nodes in redundant statement `Block`s there may not be good way to distinguish between `if(a){} else { if (b) {} }` and `if (a){} elsei(b) {}`. The solution is obviously to maintain the redundant block scope in the ast then – sehe Jul 14 '23 at 21:44
  • I got it do what I want this way. First, introduce a new struct `struct IECond { Expression condition; Block block;}` then tweak the current Conditional struct this way `struct Conditional { IECond if_block; std::vector elif_block; boost::optional else_block;}` and for the grammar I did this `ifel_block_ = no_case["if"] >> condition_core_ >> no_case["endif"]; condition_core_ = ie_cond_ >> *(no_case["elseif"] >> ie_cond_) >> -(no_case["else"] >> stmt_block_); ie_cond_ = '(' >> expr_ >> ')' >> stmt_block_;` Now, the ElseIf blocks are captured separate from Else block. – Dylan Jul 14 '23 at 21:59
  • Yeah that works, at the cost of duplicating something that is semantically identical. However, it **does** save you a semantic switch during XML generation (by replacing it with static type switch). It's up to you to decide whether this duplication is worth the maintenance cost. – sehe Jul 14 '23 at 22:03
  • **Semantic switch in XML generation** I was trying to do this at first but can't seem to find a way to check and switch base on the type to render XML. Can you shed some tips? :) – Dylan Jul 14 '23 at 22:13
  • I obviously donot know what your current code looks like, but this is how it [would look in my example](https://compiler-explorer.com/z/P5Y4r168r). Note it also deals with simplifying one-statement blocks during XML generation. You might want to flip that around, making the `statement` rule create blocks only on multi-statements. Output locally: https://i.imgur.com/gWAizow.png – sehe Jul 15 '23 at 14:39
  • Thanks @sehe. I would not be able to come up with what you have ..:). I was trying to check when generating else_block and see if condition exists then it is elseif. But doesn’t work right. Anyway. Thanks for the example tweak. – Dylan Jul 15 '23 at 21:41
  • I'm trying to detect the Bioperator grouping like ` if ((a == b && a == c) || a > 0)` to add `` tag around the bioperator when they are inside the inner parenthesis. I'm using semantic actions in [This Code](https://compiler-explorer.com/z/bs8GW8xK9). See the **Bioperator class** where I add `bool group` and `set_group_flag` function. This function will set the current expression (if it's a Bioperator type) group to true. Here's the rule `factor_= value_ | '(' >> expr_[_val = px::bind(&Ast::Bioperator::set_group_flag, _1)] >> ')'`. It doesn't compile. Is it the right way? – Dylan Jul 20 '23 at 18:38
  • Do yourself a favor, and make the parentheses *only* part of the grammar to specify overrides of standard precedence/associativity. The AST nodes have natural precedence as reflected in the node graph. Whatever you do, a "group flag" is not what you want. Your princess is in another castle. Please review one of my many expression grammar examples on this site if you want examples. – sehe Jul 20 '23 at 19:00
  • Right now I'm going to just repeat my [stance of Jun 21st](https://stackoverflow.com/questions/76526586/need-help-parsing-complex-if-elseif-else-statement-to-xml-with-boost-spirit-in-c/76527728?noredirect=1#comment134931693_76527728): I cannot keep doing this project for you. Well, in fact I probably can, but I'd do it on my terms, and step one would be to negotiate a price and know the domain requirements. – sehe Jul 20 '23 at 19:00