Boost::Spirit doubles character when followed by a default value

Question

I use boost::spirit to parse (a part) of a monomial like x, y, xy, x^2, x^3yz. I want to save the variables of the monomial into a map, which also stores the corresponding exponent. Therefore the grammar should also save the implicit exponent of 1 (so x stores as if it was written as x^1).

start = +(potVar);
potVar=(varName>>'^'>>exponent)|(varName>> qi::attr(1));// First try: This doubles the variable name
//potVar = varName >> (('^' >> exponent) |  qi::attr(1));// Second try: This works as intended
exponent = qi::int_;
varName  = qi::char_("a-z");

When using the default attribute as in the line "First try", Spirit doubles the variable name.
Everything works as intended when using the default attribute as in the line "Second try".

'First try' reads a variable x and stores the pair [xx, 1].
'Second try' reads a variable x and stores the pair [x, 1].
I think I solved the original problem myself. The second try works. However, I don't see how I doubled the variable name. Because I am about to get familiar with boost::spirit, which is a collection of challenges for me, and there are probably more to come, I would like to understand this behavior.

This is the whole code to recreate the problem. The frame of the grammar is copied from a presentation of the KIT https://panthema.net/2018/0912-Boost-Spirit-Tutorial/ , and Stackoverflow was already very helpful, when I needed the header, which enables me to use the std::pair.

#include <iostream>
#include <iomanip>
#include <stdexcept>
#include <cmath>
#include <map>
#include <utility>//for std::pair

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp> //https://stackoverflow.com/questions/53953642/parsing-map-of-variants-with-boost-spirit-x3

namespace qi = boost::spirit::qi;

template <typename Parser, typename Skipper, typename ... Args>
void PhraseParseOrDie(
    const std::string& input, const Parser& p, const Skipper& s,
    Args&& ... args)
{
    std::string::const_iterator begin = input.begin(), end = input.end();
    boost::spirit::qi::phrase_parse(
        begin, end, p, s, std::forward<Args>(args) ...);
    if (begin != end) {
        std::cout << "Unparseable: "
            << std::quoted(std::string(begin, end)) << std::endl;
        throw std::runtime_error("Parse error");
    }
}

class ArithmeticGrammarMonomial : public qi::grammar<
    std::string::const_iterator,
    std::map<std::string, int>(), qi::space_type>
{
public:
    using Iterator = std::string::const_iterator;

    ArithmeticGrammarMonomial() : ArithmeticGrammarMonomial::base_type(start)
    {
        start = +(potVar);
        potVar=(varName>>'^'>>exponent)|(varName>> qi::attr(1));
        //potVar = varName >> (('^' >> exponent) |  qi::attr(1));
        exponent = qi::int_;
        varName  = qi::char_("a-z");
    }


    qi::rule<Iterator, std::map<std::string, int>(), qi::space_type> start;
    qi::rule<Iterator, std::pair<std::string, int>(), qi::space_type> potVar;
    qi::rule<Iterator, int()> exponent;
    qi::rule<Iterator, std::string()> varName;
};

void test2(std::string input)
{
    
    std::map<std::string, int> out_map;
    PhraseParseOrDie(input, ArithmeticGrammarMonomial(), qi::space, out_map);

    std::cout << "test2() parse result: "<<std::endl;
    for(auto &it: out_map)
        std::cout<< it.first<<it.second << std::endl;
}

/******************************************************************************/

int main(int argc, char* argv[])
{
    std::cout << "Parse Monomial 1" << std::endl;
    test2(argc >= 2 ? argv[1] : "x^3y^1");
    test2(argc >= 2 ? argv[1] : "xy");
    return 0;
}

Live demo

sehe · Accepted Answer · 2022-01-10T13:19:24.103

I think I solved the original problem myself. The second try works.

Indeed. It's how I'd do this (always match the AST with your parser expressions).

However, I don't see how I doubled the variable name.

It's due to backtracking with container attributes. They don't get rolled back. So the first branch parses potVar into a string, and then the parser backtracks into the second branch, which parses potVar into the same string.

It can also crop up with semantic actions:

In short:

match your AST structure in your rule expression, or use qi::hold to force the issue (at performance cost)
avoid semantic actions (Boost Spirit: "Semantic actions are evil"?)

For inspiration, here's a simplified take using Spirit X3

Live On Compiler Explorer

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <fmt/ranges.h>
#include <map>

namespace Parsing {
    namespace x3 = boost::spirit::x3;

    auto exponent = '^' >> x3::int_ | x3::attr(1);
    auto varName  = x3::repeat(1)[x3::char_("a-z")];

    auto potVar
        = x3::rule<struct P, std::pair<std::string, int>>{}
        = varName >> exponent;
    auto start  = x3::skip(x3::space)[+potVar >> x3::eoi];

    template <typename T = x3::unused_type>
    void StrictParse(std::string_view input, T&& into = {})
    {
        auto f = input.begin(), l = input.end();

        if (!x3::parse(f, l, start, into)) {
            fmt::print(stderr, "Error at: '{}'\n", std::string(f, l));
            throw std::runtime_error("Parse error");
        }
    }
} // namespace Parsing

void test2(std::string input) {
    std::map<std::string, int> out_map;
    Parsing::StrictParse(input, out_map);

    fmt::print("{} -> {}\n", input, out_map);
}

int main() {
    for (auto s : {"x^3y^1", "xy"})
        test2(s);
}

Prints

x^3y^1 -> [("x", 3), ("y", 1)]
xy -> [("x", 1), ("y", 1)]

Bonus Notes

It looks to me like you should be more careful. Even if you assume that all variables are 1 letter and no terms can occur (only factors), then still you need to correctly handle x^5y^2x to be x^6y^2 right?

Here's Qi version that uses semantic actions to correctly accumulate like factors:

Live On Coliru

#include <boost/spirit/include/phoenix.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <iostream>
#include <map>

namespace qi   = boost::spirit::qi;
using Iterator = std::string::const_iterator;
using Monomial = std::map<char, int>;

struct ArithmeticGrammarMonomial : qi::grammar<Iterator, Monomial()> {
    ArithmeticGrammarMonomial() : ArithmeticGrammarMonomial::base_type(start) {
        using namespace qi;
        exp_  = '^' >> int_ | attr(1);
        start = skip(space)[                        //
            +(char_("a-z") >> exp_)[_val[_1] += _2] //
        ];
    }

  private:
    qi::rule<Iterator, Monomial()>            start;
    qi::rule<Iterator, int(), qi::space_type> exp_;
};

void do_test(std::string_view input) {
    Monomial output;

    static const ArithmeticGrammarMonomial p;
    Iterator f(begin(input)), l(end(input));
    qi::parse(f, l, qi::eps > p, output);

    std::cout << std::quoted(input) << " -> " << std::endl;
    for (auto& [var,exp] : output)
        std::cout << " - " << var << '^' << exp << std::endl;
}

int main() {
    for (auto s : {"x^3y^1", "xy", "x^5y^2x"})
        do_test(s);
}

Prints

"x^3y^1" ->
 - x^3
 - y^1
"xy" ->
 - x^1
 - y^1
"x^5y^2x" ->
 - x^6
 - y^2

Added the bonus take in Qi **[Live On Coliru](http://coliru.stacked-crooked.com/a/20b755950e384126)** (and here for X3: https://godbolt.org/z/z6xddcc7n) — sehe, Jan 10 '22 at 13:29
"It's due to backtracking with container attributes. " That would be sufficient as an answer. A pity, that I can't upvote more to honor the extra effort. — merkatorix, Jan 10 '22 at 14:32
May I ask multiple follow-up questions? Why did you use x3 at first and then qi? Is this the answer? https://stackoverflow.com/questions/45457868/statefulness-of-spirit-v2-and-x3 You like x3, but it does not allow for the states? Which means getting into qi at this moment will still pay off and be compileable in 10 years? — merkatorix, Jan 10 '22 at 15:14
I wished I could debug your qi code to understand it (A real problem I have with Boost::Spirit, I already started to use Antlr just for the graph). How does it count each exponent separately? I would have assumed that it just adds up all exponents. However, I see suspicious underscores. — merkatorix, Jan 10 '22 at 15:14
X3 is more lightweight and allows for more modern C++ (c++14+). The post you [link](https://stackoverflow.com/questions/45457868/statefulness-of-spirit-v2-and-x3) marks one of the the critical differences. In general I'd say Qi is simpler to use but uses more magic under the hood. X3 is just more lightweight but requires the user to supply more plumbing. In terms of support, Qi is still the current (X3 is technically experimental) — sehe, Jan 10 '22 at 19:57
You can use `BOOST_SPIRIT_DEBUG` like this: http://coliru.stacked-crooked.com/a/b662b362a2f548fe. To also track the semantic action, you will have elaborate a little bit: http://coliru.stacked-crooked.com/a/2e29cbe491e83ed8. I assume that clarifies? — sehe, Jan 10 '22 at 20:17
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/240937/discussion-between-sehe-and-merkatorix). — sehe, Jan 10 '22 at 20:17

Boost::Spirit doubles character when followed by a default value

1 Answers1

Bonus Notes

Linked