1

In Boost.Spirit one can read from a stream to a std::vector simply by doing:

#include<vector>
#include<boost/spirit/include/qi.hpp>
namespace sqi = boost::spirit::qi;
int main(){
        std::string const v_str = "AA BB CC";
        std::vector<std::string> v;
        auto it = begin(v_str);
        bool r = sqi::phrase_parse(it, end(v_str), 
                    (*sqi::lexeme[+sqi::char_("A-Z")]), sqi::space, v);
        assert( v.size() == 3  and v[2] == "CC" );
}

However, it happens that I know the number of elements in advance because of the input format and I should be able to prereserve the space in the vector. For example if the input string is "3 AA BB CC", one can allocate in advance three elements.

The question is how to pass this extra information to the vector and optimize the later push_back (e.g. avoiding reallocations).

What I tried was to parse an integer at the beginning at associate a semantic action to it where a reserve is executed.

        std::string const v_str = "3 AA BB CC";
        std::vector<std::string> v;
        auto it = begin(v_str);
        bool r = sqi::phrase_parse(it, end(v_str), 
             sqi::int_[([&](int i){v.reserve(i);})] >> 
                (*sqi::lexeme[+sqi::char_("A-Z")]), sqi::space, v);

The problem is that the integer is not ignored after the semantic action and from my tests I can see that it tries to push the result (the 3 in the example) into the vector ever after reserve.

Another workaround would be to add another argument to phrase_parse function but that seems to be an overkill.

So, how can I parse something in Boost.Spirit and only execute the semantic action without sending the result to the sink variable?

Even if this can be done I am not really sure if this is the right way to do it.

alfC
  • 14,261
  • 4
  • 67
  • 118
  • 1
    I think you may try qi::repeat() to achieve what you need. See reference on https://www.boost.org/doc/libs/1_70_0/libs/spirit/doc/html/spirit/qi/reference/directive/repeat.html. – drus Apr 26 '19 at 18:06
  • @drus, exactly the example at the end of the link is exactly what I am looking for, but for some reason in my case the first `char_[...]` parser ends also in the `str` instead of being used just of the semantic action. Perhaps this is because I use a lambda instead of a phoenix expression. – alfC Apr 26 '19 at 18:18

3 Answers3

0

You can create fake vector which will just count when inserting and parse same text twice:

#include<vector>
#include<boost/spirit/include/qi.hpp>
namespace sqi = boost::spirit::qi;
struct fake_vector
{
    typedef std::string value_type;
    fake_vector() : counter(0) {}
    std::size_t end() const {return 0;};
    void insert(std::size_t, std::string){ ++counter; }

    std::size_t counter;
};
int main(){
        std::string const v_str = "AA BB CC";
        auto it = begin(v_str);
        fake_vector fv;
        bool r = sqi::phrase_parse(it, end(v_str), (*sqi::lexeme[+sqi::char_("A-Z")]), sqi::space, fv);
        assert(fv.counter == 3);
        std::vector<std::string> v;
        v.reserve(fv.counter);
        it = begin(v_str);
        r = sqi::phrase_parse(it, end(v_str), (*sqi::lexeme[+sqi::char_("A-Z")]), sqi::space, v);
        assert( v.size() == 3  and v[2] == "CC" );
}
user2807083
  • 2,962
  • 4
  • 29
  • 37
  • I though Spirit was using `push_back` (not `insert`). I am going to see this. At the moment I have a less-horrible solution. – alfC Apr 25 '19 at 22:58
  • 1
    I thought same, but compiler wants `insert(.end(), value)` – user2807083 Apr 25 '19 at 23:01
  • Btw, I do know the number of elements. I want to make the reserve call a part of the parser. – alfC Apr 26 '19 at 00:08
  • The only advantage seems to be that `insert(.end(), value())` work with `vector`, `list`, `deque` and (with a different semantics--hint) on `[multi]set`. While `push_back` works only on `vector`, `list` and `deque` and doesn't work on `[multi]set`. It is probably called through a `boost::phoenix::insert` expression https://www.boost.org/doc/libs/1_66_0/boost/phoenix/stl/container/container.hpp – alfC Apr 29 '19 at 16:13
0

Ok, I seems that I had to deconstruct the easy facilities of Spirit and convert all to semantic action, which created other problem on the way (for example that lexeme[+char_] maps to std::vector<char> instead of the exepcted std::string.

{
    std::string const v_str = "AA BB CC";
    std::vector<std::string> v;
    auto it = begin(v_str);
    bool r = sqi::phrase_parse(it, end(v_str), 
        (*(sqi::lexeme[(+sqi::char_("A-Z"))][([&](auto&& s){v.emplace_back(begin(s), end(s));})])), sqi::space);
    assert( v.size() == 3);
    assert( v[2] == "CC" );
}
{
    std::string const v_str = "3 AA BB CC";
    std::vector<std::string> v;
    auto it = begin(v_str);
    bool r = sqi::phrase_parse(it, end(v_str), 
        sqi::int_[([&](int i){v.reserve(i);})] >> 
            (*(sqi::lexeme[(+sqi::char_("A-Z"))][([&](auto&& s){v.emplace_back(begin(s), end(s));})])), sqi::space);
    assert( v.size() == 3 );
    assert( v[2] == "CC" );
}

Since this modifies the last argument of phrase_parse I might as well put a dummy int.

alfC
  • 14,261
  • 4
  • 67
  • 118
  • Not saying y ou should do this ([on the contrary](https://stackoverflow.com/questions/8259440/boost-spirit-semantic-actions-are-evil)), but you can without changing the input: http://coliru.stacked-crooked.com/a/981e890c963b0acc – sehe Apr 26 '19 at 09:47
  • @sehe, sorry if I was not clear, it happens to be the case that the input has the number of elements in the front of the input. I want to take advantage of that. I pretty much agree that the path of semantic actions can lead to suffering, but in this case 1) it seems I cannot take advantage of the optimization without a semantic action (someone has to call `reserve`) and 2) the semantic action has no real effect in the state of the object, so I think in principle should be fine. – alfC Apr 26 '19 at 18:13
  • @sehe, I am not trying to use attributes to do this, but not without a problems https://stackoverflow.com/questions/55965469/sematic-action-destroys-the-result-in-spirit-x3. (Also, I was using the wrong version of Spirit) – alfC May 03 '19 at 07:57
0

Thanks to the links I was pointed to by @sehe and @drus and finding about qi::omit, I realize I can associate a semantic action and then omit the result.

The format I have to handle is redundant (the size is redundant with the number of elements), so I have to semantically omit something in any case.

    using namespace sqi;
    std::string const v_str = "3 AA BB CC";
    {
        std::vector<std::string> v;
        auto it = begin(v_str);
        bool r = sqi::phrase_parse(
            it, end(v_str), 
            omit[int_] >> *lexeme[+(char_-' ')],
            space, v
        );
        assert( v.size() == 3 and v[2] == "CC" );
    }

But doesn't mean that I cannot use the omitted (redundant) part for optimization purposes or consistency check.

    {
        std::vector<std::string> v;
        auto it = begin(v_str);
        bool r = sqi::phrase_parse(
            it, end(v_str), 
            omit[int_[([&](int n){v.reserve(n);})]] >> *lexeme[+(char_-' ')],
            space, v
        );
        assert( v.size() == 3 and v[2] == "CC" );
    }

I agree that semantic actions are evil, but in my opinion only when they change the state of the sink objects. One can argue that reserve does not change the state of the vector.

In fact, this way I can optimize memory usage by reserve and also the parser execution by using repeat instead of the unbounded kleene*. (Apparently repeat can be more efficient).

    {
        std::vector<std::string> v;
        auto it = begin(v_str);
        int n;
        bool r = sqi::phrase_parse(
            it, end(v_str), 
            omit[int_[([&](int nn){v.reserve(n = nn);})]] >> repeat(phx::ref(n))[lexeme[+(char_-' ')]],
            space, v
        );
        assert( n == v.size() and v.size() == 3 and v[2] == "CC" );
    }

(unsing phx::ref is fundamental because the evaluation of n has to be delayed)

alfC
  • 14,261
  • 4
  • 67
  • 118