inconsistent behavior of boost spirit grammar

Question

I have a little grammar that I want to use for a work project. A minimum executable example is:

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wunused-local-typedefs"
#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
#pragma GCC diagnostic ignored "-Wunused-variable"
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/qi_grammar.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>
#pragma GCC diagnostic pop // pops

#include <iostream>

int main()

{
    typedef  unsigned long long ull;

    std::string curline = "1;2;;3,4;5";
    std::cout << "parsing:  " << curline << "\n";

    namespace qi = boost::spirit::qi;
    auto ids = -qi::ulong_long % ','; // '-' allows for empty vecs.
    auto match_type_res = ids % ';' ;
    std::vector<std::vector<ull> > r;
    qi::parse(curline.begin(), curline.end(), match_type_res, r);

    std::cout << "got:      ";
    for (auto v: r){
        for (auto i : v)
            std::cout << i << ",";
        std::cout << ";";
    }
    std::cout <<"\n";
}

On my personal machine this produces the correct output: parsing: 1;2;;3,4;5 got: 1,;2,;;3,4,;5,;

But at work it produces: parsing: 1;2;;3,4;5 got: 1,;2,;;3,

In other words, it fails to parse the vector of long integers as soon as there's more than one element in it.

Now, I have identified that the work system is using boost 1.56, while my private computer is using 1.57. Is this the cause?

Knowning we have some real spirit experts here on stack overflow, I was hoping someone might know where this issue is coming from, or can at least narrow down the number of things I need to check.

If the boost version is the problem, I can probably convince the company to upgrade, but a workaround would be welcome in any case.

[For reference, here's the list of changes from 1.56.0 to 1.57.0](http://www.boost.org/users/history/version_1_57_0.html). I don't see any changes to Spirit, so I'm not sure what to suggest. — Cornstalks, Jul 08 '15 at 13:09
There weren't relevant changes in the library between those versions. — sehe, Jul 08 '15 at 13:33
@Cornstalks, well spotted. I didn't realize that was my question. — Spacemoose, Jul 08 '15 at 14:32
@Cornstalks I don't think it's fair to mark duplicates in retrospect. Note that I added the [tag:auto] tag after answering to aid with search indexing. SO is useful _because_ we _do not_ organize questions according to symptoms/signals, not root causes (that's what tags are for: root cause is c++!). In this case, perhaps ***[this](http://stackoverflow.com/q/26410498/85371)** is a proper duplicate. — sehe, Jul 08 '15 at 17:36
To the OP: in case you are interested, here's how I approached this question: [from 38:00 onwards](https://www.livecoding.tv/video/bachgoldberg-and-a-spiritauto-ub-case/) I was live-coding at the time. You'll notice how quickly I realize that inconsistent output smells of undefined behaviour. Next time, you can use this in your search terms - even though it is particularly tricky to spot the exact source of the UB. Also note, @Cornstalks, that although I spot the problem quickly, I spent ~20 minutes making it a quality answer. — sehe, Jul 08 '15 at 17:36
Awkward typo: "we _do not_ organize" should have become "we do organize" — sehe, Jul 08 '15 at 17:51
@sehe: it is a quality answer indeed, and is why I upvoted it (and your other answer I linked to earlier). — Cornstalks, Jul 09 '15 at 03:28

score 2 · Accepted Answer · edited May 23 '17 at 12:21

You're invoking Undefined Behaviour in your code.

Specifically where you use auto to store a parser expression. The Expression Template contains references to temporaries that become dangling at the end of the containing full-expression¹.

UB implies that anything can happen. Both compilers are right! And the best part is, you will probably see varying behaviour depending on the compiler flags used.

Fix it either by using:

qi::copy (or boost::proto::deep_copy before v.1.55 IIRC)
use BOOST_SPIRIT_AUTO instead of BOOST_AUTO (mostly helpful iff you also support C++03)
use qi::rule<> and qi::grammar<> (the non-terminals) to type-erase and the expressions. This has performance impact too, but also gives more features like
- recursive rules
- locals and inherited attributes
- declared skippers (handy because rules can be implicitly lexeme[] (see here)
- better code organization.

Note also that Spirit X3 promises to drop there restrictions on the use with auto. It's basically a whole lot more lightweight due to the use of c++14 features. Keep in mind that it's not stable yet.

Showing that GCC with -O2 shows undefined results; Live On Coliru
The fixed version:

Live On Coliru

//#pragma GCC diagnostic push
//#pragma GCC diagnostic ignored "-Wunused-local-typedefs"
//#pragma GCC diagnostic ignored "-Wmaybe-uninitialized"
//#pragma GCC diagnostic ignored "-Wunused-variable"
#include <boost/spirit/include/karma.hpp>
#include <boost/spirit/include/qi.hpp>
//#pragma GCC diagnostic pop // pops

#include <iostream>

int main() {
    typedef  unsigned long long ull;

    std::string const curline = "1;2;;3,4;5";
    std::cout << "parsing: '" << curline << "'\n";

    namespace qi = boost::spirit::qi;

#if 0 // THIS IS UNDEFINED BEHAVIOUR:
    auto ids     = -qi::ulong_long % ','; // '-' allows for empty vecs.
    auto grammar = ids % ';';
#else // THIS IS CORRECT:
    auto ids     = qi::copy(-qi::ulong_long % ','); // '-' allows for empty vecs.
    auto grammar = qi::copy(ids % ';');
#endif

    std::vector<std::vector<ull> > r;
    qi::parse(curline.begin(), curline.end(), grammar, r);

    std::cout << "got:      ";
    for (auto v: r){
        for (auto i : v)
            std::cout << i << ",";
        std::cout << ";";
    }
    std::cout <<"\n";
}

Printing (also with GCC -O2!):

parsing: '1;2;;3,4;5'
got:      1,;2,;;3,4,;5,;

¹ (that's basically "at the next semicolon" here; but in standardese)

inconsistent behavior of boost spirit grammar

1 Answers1

Linked