Boost spirit: how to use custom logic when parsing a list of doubles with text specifiers

Question

I would like to parse a vector of doubles. However this vector may also contain two types of statements that compress the data somewhat: FOR and RAMP.

If FOR is in the string, it should be of the format "<double> FOR <int>". It means repeat the <double> <int> times.

For example "1 1.5 2 2.5 3 FOR 4 3.5" should parse to { 1, 1.5, 2, 2.5, 3, 3, 3, 3, 3.5 }

If RAMP is in the string, it should be of the format "<double1> RAMP <int> <double2>". It means linearly interpolate between <double1> and <double2> over <int> periods.

For example "1 2 3 4 RAMP 3 6 7 8" should parse to { 1, 2, 3, 4, 5, 6, 7, 8 }

I'm at a loss at how to proceed beyond defining the parsers for the individual elements. How do I supply custom code to perform the expansions when they are encountered?

Thanks!

score 1 · Answer 1 · answered Dec 05 '19 at 17:45

The simplest without semantic actions¹ would be to parse into an AST which you then interpret.

The more tedious approach would be to use semantic actions to build the result. (Keep in mind this gets problematic with backtracking grammars.)

Using an AST Representation

An example AST:

namespace AST {
    using N = unsigned long;
    using V = double;

    struct repeat { N n; V value; };
    struct interpolate {
        N n; V start, end;
        bool is_valid() const;
    };

    using element = boost::variant<repeat, interpolate>;
    using elements = std::vector<element>;

The is_valid is a good place where we can do logic asserts like "the number of periods isn't zero" or "if the number of periods is 1, start and end must coincide".

Now, for our end-result we want to have a translation to just-a-vector-of-V:

    using values = std::vector<V>;

    static inline values expand(elements const& v) {
        struct {
            values result;
            void operator()(repeat const& e) {
                result.insert(result.end(), e.n, e.value);
            }
            void operator()(interpolate const& e) {
                if (!e.is_valid()) {
                    throw std::runtime_error("bad interpolation");
                }
                if (e.n>0) { result.push_back(e.start); }
                if (e.n>2) {
                    auto const delta = (e.end-e.start)/(e.n-1);
                    for (N i=1; i<(e.n-1); ++i)
                        result.push_back(e.start + i * delta);
                }
                if (e.n>1) { result.push_back(e.end); }
            }
        } visitor;
        for (auto& el : v) {
            boost::apply_visitor(visitor, el);
        }
        return std::move(visitor.result);
    }
}

Now that we have the basics down, let's parse and test:

Parsing

First, let's adapt the AST types:

BOOST_FUSION_ADAPT_STRUCT(AST::repeat, value, n)
BOOST_FUSION_ADAPT_STRUCT(AST::interpolate, start, n, end)

Note: the "natural grammar order" of the adapted properties makes attribute propagation painless without semantic actions

Now let's roll a grammar:

namespace qi = boost::spirit::qi;

template <typename It> struct Grammar : qi::grammar<It, AST::elements()> {
    Grammar() : Grammar::base_type(start) {
        elements_ = *element_;
        element_ = interpolate_ | repeat_;
        repeat_
            = value_ >> "FOR" >> qi::uint_
            | value_ >> qi::attr(1u)
            ;
        interpolate_
            = value_ >> "RAMP" >> qi::uint_ >> value_
            ;

        value_ = qi::auto_;

        start = qi::skip(qi::space) [ elements_ ];

        BOOST_SPIRIT_DEBUG_NODES((start)(elements_)(element_)(repeat_)(interpolate_)(value_))
    }
  private:
    qi::rule<It, AST::elements()> start;
    qi::rule<It, AST::elements(),    qi::space_type> elements_;
    qi::rule<It, AST::element(),     qi::space_type> element_;
    qi::rule<It, AST::repeat(),      qi::space_type> repeat_;
    qi::rule<It, AST::interpolate(), qi::space_type> interpolate_;
    qi::rule<It, AST::V(),           qi::space_type> value_;
};

Note:

BOOST_SPIRIT_DEBUG_NODES enables rule debugging

The order of interpolate_ | repeat_ is important, since repeat_ also parses individual numbers (so it would prevent FROM from being parsed in time.

A simple utility to invoke the parser and also expand() the intermediate representation:

AST::values do_parse(std::string const& input) {
    static const Grammar<std::string::const_iterator> g;

    auto f = begin(input), l = end(input);
    AST::elements intermediate;
    if (!qi::parse(f, l, g >> qi::eoi, intermediate)) {
        throw std::runtime_error("bad input");
    }

    return expand(intermediate);
}

Testing

The proof of the pudding is in the eating:

Live On Coliru

int main() {
    std::cout << std::boolalpha;

    struct { std::string input; AST::values expected; } cases[] = {
        { "1 1.5 2 2.5 3 FOR 4 3.5", { 1, 1.5, 2, 2.5, 3, 3, 3, 3, 3.5 } },
        { "1 2 3 4 RAMP 3 6 7 8", { 1, 2, 3, 4, 5, 6, 7, 8 } },
    };

    for (auto const& test : cases) {
        try {
            std::cout << std::quoted(test.input) << " -> ";
            auto actual = Parse::do_parse(test.input);
            std::cout << (actual==test.expected? "PASSED":"FAILED") << " { ";

            // print the actual for reference
            std::cout << " {";
            for (auto& v : actual) std::cout << v << ", ";
            std::cout << "}\n";
        } catch(std::exception const& e) {
            std::cout << "ERROR " << std::quoted(e.what()) << "\n";
        }
    }
}

Printing

"1 1.5 2 2.5 3 FOR 4 3.5" -> PASSED {  {1, 1.5, 2, 2.5, 3, 3, 3, 3, 3.5, }
"1 2 3 4 RAMP 3 6 7 8" -> PASSED {  {1, 2, 3, 4, 5, 6, 7, 8, }

Using Semantic Actions Instead

This might be more efficient and I found I actually prefer the expressiveness of this approach.

It might not scale well as the grammar grows more complicated though.

Here we "invert" the flow:

Grammar() : Grammar::base_type(start) {
    element_ =
          qi::double_                          [ px::push_back(qi::_val, qi::_1) ]
        | ("FOR" >> qi::uint_)                 [ handle_for(qi::_val, qi::_1) ]
        | ("RAMP" >> qi::uint_ >> qi::double_) [ handle_ramp(qi::_val, qi::_1, qi::_2) ]
        ;

    start = qi::skip(qi::space) [ *element_ ];
}

Here handle_for and handle_ramp in the semantic actions are Lazy Actors that basically perform the same operation as expand() did in the AST-based appraoch, but

on the fly
the first operand is implicit (it is the last value already at the back of the vector)

This makes for a few extra checks (we don't want UB when the user passes a string that starts with "FOR" or "RAMP"):

    struct handle_for_f {
        void operator()(Values& vec, unsigned n) const {
            if (vec.empty() || n<1)
                throw std::runtime_error("bad quantifier");

            vec.insert(vec.end(), n-1, vec.back());
        }
    };

    struct handle_ramp_f {
        void operator()(Values& vec, unsigned n, double target) const {
            if (vec.empty())
                throw std::runtime_error("bad quantifier");
            if ((n == 0) || (n == 1 && (vec.back() != target)))
                throw std::runtime_error("bad interpolation");

            auto start = vec.back();

            if (n>2) {
                auto const delta = (target-start)/(n-1);
                for (std::size_t i=1; i<(n-1); ++i)
                    vec.push_back(start + i * delta);
            }
            if (n>1) { vec.push_back(target); }
        }
    };

To avoid tedious boost::phoenix::bind in the semantic actions, let's adapt as Phoenix Functions:

    px::function<handle_for_f> handle_for;
    px::function<handle_ramp_f> handle_ramp;

Parsing

The do_parse helper became simpler because we have no intermediate representation:

Values do_parse(std::string const& input) {
    static const Grammar<std::string::const_iterator> g;

    auto f = begin(input), l = end(input);
    Values values;
    if (!qi::parse(f, l, g >> qi::eoi, values)) {
        throw std::runtime_error("bad input");
    }

    return values;
}

Testing

Again, the proof of the pudding is in the eating. The test program with unmodified main():

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>
#include <iostream>
#include <iomanip>

using Values = std::vector<double>;

namespace Parse {
    namespace qi = boost::spirit::qi;
    namespace px = boost::phoenix;

    template <typename It> struct Grammar : qi::grammar<It, Values()> {
        Grammar() : Grammar::base_type(start) {
            element_ =
                  qi::double_                          [ px::push_back(qi::_val, qi::_1) ]
                | ("FOR" >> qi::uint_)                 [ handle_for(qi::_val, qi::_1) ]
                | ("RAMP" >> qi::uint_ >> qi::double_) [ handle_ramp(qi::_val, qi::_1, qi::_2) ]
                ;

            start = qi::skip(qi::space) [ *element_ ];
        }
      private:
        qi::rule<It, Values()> start;
        qi::rule<It, Values(), qi::space_type> element_;

        struct handle_for_f {
            void operator()(Values& vec, unsigned n) const {
                if (vec.empty() || n<1)
                    throw std::runtime_error("bad quantifier");

                vec.insert(vec.end(), n-1, vec.back());
            }
        };

        struct handle_ramp_f {
            void operator()(Values& vec, unsigned n, double target) const {
                if (vec.empty())
                    throw std::runtime_error("bad quantifier");
                if ((n == 0) || (n == 1 && (vec.back() != target)))
                    throw std::runtime_error("bad interpolation");

                auto start = vec.back();

                if (n>2) {
                    auto const delta = (target-start)/(n-1);
                    for (std::size_t i=1; i<(n-1); ++i)
                        vec.push_back(start + i * delta);
                }
                if (n>1) { vec.push_back(target); }
            }
        };

        px::function<handle_for_f> handle_for;
        px::function<handle_ramp_f> handle_ramp;
    };

    Values do_parse(std::string const& input) {
        static const Grammar<std::string::const_iterator> g;

        auto f = begin(input), l = end(input);
        Values values;
        if (!qi::parse(f, l, g >> qi::eoi, values)) {
            throw std::runtime_error("bad input");
        }

        return values;
    }
}

int main() {
    std::cout << std::boolalpha;

    struct { std::string input; Values expected; } cases[] = {
        { "1 1.5 2 2.5 3 FOR 4 3.5", { 1, 1.5, 2, 2.5, 3, 3, 3, 3, 3.5 } },
        { "1 2 3 4 RAMP 3 6 7 8", { 1, 2, 3, 4, 5, 6, 7, 8 } },
    };

    for (auto const& test : cases) {
        try {
            std::cout << std::quoted(test.input) << " -> ";
            auto actual = Parse::do_parse(test.input);
            std::cout << (actual==test.expected? "PASSED":"FAILED") << " { ";

            // print the actual for reference
            std::cout << " {";
            for (auto& v : actual) std::cout << v << ", ";
            std::cout << "}\n";
        } catch(std::exception const& e) {
            std::cout << "ERROR " << std::quoted(e.what()) << "\n";
        }
    }
}

Printing the same as before:

"1 1.5 2 2.5 3 FOR 4 3.5" -> PASSED {  {1, 1.5, 2, 2.5, 3, 3, 3, 3, 3.5, }
"1 2 3 4 RAMP 3 6 7 8" -> PASSED {  {1, 2, 3, 4, 5, 6, 7, 8, }

¹ Boost Spirit: "Semantic actions are evil"?

Wow thanks. I did my best with a semantic action lambda (capturing the double vector) but I cannot figure out the correct signature for the ramp handler. `[&vec](const double &val)` works great as a handler for `double_`, but I just can't figure it out for `qi::no_case[qi::lit("ramp")] >> qi::int_ >> qi::double_`. Was hoping for something less verbose. — Brian, Dec 05 '19 at 20:21
Just figured it out! It's `[&vec](const boost::fusion::vector & vals)`. Got the compiler to tell me it by using `[&vec](auto &vals)` and doing something not allowed with `vals`. Thanks compiler. — Brian, Dec 05 '19 at 20:58

score 0 · Answer 2 · answered Dec 05 '19 at 21:52

0

Here's what I ultimately went it. It uses Semantic Actions but it's simpler than @sehe's probably more correct answers: doesn't use template functions, doesn't use phoenix, doesn't require custom Grammar structs.

#include <iostream>
#include <string>
#include <vector>

#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;
namespace fusion = boost::fusion;

std::vector<double> parseVector(const std::string & vec_str)
{
    std::vector<double> vec;

    auto for_handler = [&vec, &vec_str](const unsigned &len) {
        if (len == 0)
            throw std::runtime_error("Invalid vector: " + vec_str);
        vec.insert(vec.end(), len - 1, vec.back());
    };
    auto ramp_handler = [&vec, &vec_str](const fusion::vector<unsigned, double> & vals) {
        double start = vec.back();
        double target = fusion::at_c<1>(vals);
        unsigned len = fusion::at_c<0>(vals);
        if (len == 0 || (len == 1 && start != target))
            throw std::runtime_error("Invalid vector: " + vec_str);
        if (len >= 2) {
            for (unsigned i = 0; i < len - 2; i++)
                vec.push_back(start + (i + 1) * (target - start) / (len - 1));
            vec.push_back(target);
        }
    };
    auto double_handler = [&vec](const double &val) {
        vec.push_back(val);
    };

    auto for_rule = qi::no_case[qi::lit("for") | qi::lit('*')] >> qi::uint_;
    auto ramp_rule = qi::no_case[qi::lit("ramp")] >> qi::uint_ >> qi::double_;
    auto vec_rule = qi::double_[double_handler] >> *(for_rule[for_handler] | ramp_rule[ramp_handler] | qi::double_[double_handler]);

    auto it = vec_str.begin();
    if (!qi::phrase_parse(it, vec_str.end(), vec_rule, qi::ascii::space) || it != vec_str.end())
        throw std::runtime_error("Invalid vector: " + vec_str);

    return vec;
}

Now if I could only make "1 for 4.5 1" throw an error instead of resolve to {1 1 1 1 0.5 1 }. Sigh.

answered Dec 05 '19 at 21:52

Brian

2,499
3
24
26

I like the brevity! Thist + gets it to work is the sign of a good developer. (Slight nit: inlining the `delta` and making the loop 0-based would not pass my code-review since it actively obfuscates the logic). – sehe Dec 06 '19 at 11:24
However, the "custom grammar struct" wasn't just for show. At the very least you need named `qi::rules` here (ironically, your naming suggests they would be). As written the behaviour is unspecified and very likely undefined. See [the many answers about "auto" and Spirit Qi rules](user:85371 boost_spirit_auto). In recent Boost Versions you can use the `qi::copy` facility to make sure you get deep-cloned expression trees. – sehe Dec 06 '19 at 11:28
The removal of the `empty()` check before using `back()` is accurate, but it hinges on the assumption that you can force the grammar to input with a double value. I didn't want to make that assumption as it would stop parsing empty input. – sehe Dec 06 '19 at 11:29
About the last question, you could make that a separate question, but let me show you in the comments for now. You need a lookahead assertion (either positive: `qi::lexeme[qi::uint_ >> &qi::space]` or negative `!strict_double_ >> qi::uint_`). **[This Live Demo](http://coliru.stacked-crooked.com/a/313ffdcaf6d98ef9)** shows you that while also bundling all the actions in a single handler, centralizing error reporting (using the `pass` variable). – sehe Dec 06 '19 at 11:40
Backgrounders: 1. [params to SA handlers](https://stackoverflow.com/a/3067881/85371) 2. [why the `lexeme[]`](https://stackoverflow.com/questions/17072987/boost-spirit-skipper-issues/17073965#17073965) with the positive look-ahead 3. note the use of `>> qi::eoi` instead of tedious checking for the end iterator 4. note the use of `qi::copy` 5. frivolous note: taking `double` or `unsigned` by `const&` is an anti-pattern if you don't intend to take them by reference. – sehe Dec 06 '19 at 11:44
Slight improvement dropping `std::ref` and the `mutable` member hack: [here](http://coliru.stacked-crooked.com/a/0dac590ad05cd374) (also fixes my stray edit in the first test-case ¯\\_(ツ)_/¯) – sehe Dec 06 '19 at 11:47
Not to scare you but just proving [my point about undefined behaviour](https://stackoverflow.com/questions/59198525/boost-spirit-how-to-use-custom-logic-when-parsing-a-list-of-doubles-with-text-s#comment104640845_59203798) in your code: the [code from your answer segfaults on coliru](http://coliru.stacked-crooked.com/a/3bfbc3823a70b3c7) (also for me locally) – sehe Dec 06 '19 at 11:58
Thanks for the heads up on the undefined behavior!. Good thing it happened to work on my compiler else I'd have thrown up my hands in frustration and chucked my computer out the window. Almost all my `for` loops with simple indexes start at 0. I find it helps me avoid off-by-one bugs: when constructing the loop all I have to think about it "how many times do I need to execute this" and then I solve what the index means in the next step. It separates concerns. As for the delta thing, I'm probably overthinking it, but I feel that adding a fixed delta might lead to floating point drift. Shrug. – Brian Dec 06 '19 at 15:50
Yeah that's overthinking because you're using the exact same formula, but less readable :) The point of running `i` from `[1,n-1]` is to _express intent_: it has the 2nd till penultimate period. The floating point inaccuracy issue was precisely what I chose that for, otherwise you'd simply write `for (unsigned i = 1; i < len; i++) _vec.push_back(start + i * delta);` risking a slightly off representation of the final value in rare cases. – sehe Dec 06 '19 at 18:31

Boost spirit: how to use custom logic when parsing a list of doubles with text specifiers

2 Answers2

Using an AST Representation

Parsing

Testing

Using Semantic Actions Instead

Parsing

Testing