2

In a previous post (How do you create a generic parser using qi?) I was given some very good advice on how to create a parser-factory. Basic examples work fine, but I have problems combining factory-generated parsers. I would like to understand why my parsers fail and if my problem can be solved or I have to live with the current situation.

Example code (trimmed down as much as I could):

#include <boost/spirit/include/qi.hpp>
#include <boost/variant.hpp>
#include <string>


using namespace boost::spirit::qi;
using parse_iter = std::string::const_iterator;

//  Rudimentary test of a boost::spirit qi parser
template<class Parser>
void check_parser(Parser& p,std::string s)
{
    parse_iter first { s.begin() };
    parse_iter end { s.end() };
    bool ok = phrase_parse(first,end,p,space);
    std::cout << "Parsing [" << s << "]. Status: " 
        << (ok ? "OK\n": "Failed!\n");
}


//  Parser factory. Should have a static get() method
//  that returns something that can parse a valid T.
template<class T> struct parse_factory;

//  An example of using the factory
//  Specialising rule for int
template<>
struct parse_factory<int> 
{
    static int_type get() { return int_; } 
};


//  This ugly macro assures that we use the same rule for strings
//#define PARSE_STR as_string[lexeme[lit('"') >> *( ~char_("\""))>> lit('"')]| +alnum]
#define PARSE_STR (lexeme[lit('"') >> *( ~char_("\""))>> lit('"')]| +alnum)

template<>
struct parse_factory<std::string>
{
    static typename boost::proto::terminal<rule<parse_iter,std::string()>>::type get()
    {
        rule<parse_iter,std::string()> res;
//        debug(res);
        res = PARSE_STR;
        return res.copy();
    }
};


    //  Testing union
//  In case you wonder about naming: the real code is related to ASN.1 
//  parsing where "CHOICE" corresponds to a boost::variant
    using my_choice = boost::variant<int,std::string>;


template<>
struct parse_factory<my_choice>
{
    static boost::proto::terminal<rule<parse_iter,my_choice()>>::type get()
    {
        rule<parse_iter,my_choice()> r;
        r = int_ | PARSE_STR;
        return r.copy();
    }
};

void test_choice()
{
    auto choice_rule1 = int_ | PARSE_STR;

    check_parser(choice_rule1,"1");         //  Prints OK
    check_parser(choice_rule1,"Hello1");     //  Prints OK


    auto choice_rule2 = (parse_factory<int>::get() 
               | parse_factory<std::string>::get());

    check_parser(choice_rule2,"2");         //  Prints OK
    check_parser(choice_rule2,"Hello2");     //  Prints Failed! <=========================
    check_parser(parse_factory<std::string>::get(),"Hello");


    auto choice_rule3 = parse_factory<my_choice>::get();
    check_parser(choice_rule3,"3");         //  Prints OK
    check_parser(choice_rule3,"Hello3");     //  Prints OK


    auto choice_rule4 = int_ | parse_factory<std::string>::get();

    check_parser(choice_rule4,"4");         //  Prints OK
    check_parser(choice_rule4,"Hello4");     //  Prints Failed! <=========================


    auto choice_rule5 = parse_factory<int>::get() | PARSE_STR;

    check_parser(choice_rule5,"5");         //  Prints OK
    check_parser(choice_rule5,"Hello5");     //  Prints OK
}

int main() 
{
    test_choice(); 
}

So e.g. using rule #2

(parse_factory<int>::get() | parse_factory<std::string>::get()) 

causes the parsing to fail. I can successfully parse an int, but not a std::string. Update: If I change "get" for int, nothing compiles (both int and std::string input fails) when I use the factory-version for int. Unless I use the my_choice factory which continues to work. Interestingly, reversing the order of the alternatives changes nothing: I can parse the int, not the string. So - any ideas if/how this problem can be solved?

Community
  • 1
  • 1
user3721426
  • 273
  • 2
  • 8
  • 1
    The right tool to solve such problems is to use your debugger, but not to ask at Stack Overflow before you did so. Tell us all your observations you made when inspecting your code stepping through line by line in 1st place. Also you might want to read [**How to debug small programs (by Eric Lippert)](https://ericlippert.com/2014/03/05/how-to-debug-small-programs/)**] At least leave us with a **[MCVE]** that reproduces your problem. (This is a personal stock comment provided by πάντα ῥεῖ™) – πάντα ῥεῖ Aug 19 '16 at 01:07
  • 1
    http://stackoverflow.com/a/26411266/2417774 – llonesmiz Aug 19 '16 at 05:51
  • 1
    Your program is rife with UB, so much should be obvious when run with memory checking enabled. Moreover some things are non-standard (can't pass non-const `Parser&` from a temporary e.g.). Here's [a fixed version with simplifications](http://coliru.stacked-crooked.com/a/6ed53a6bb4e11da8) – sehe Aug 20 '16 at 14:53
  • @πάνταῥεῖ What part of the SSCCE was not sufficient for you? – sehe Aug 20 '16 at 14:55
  • @sehe This one from my comment; _"Tell us all your observations you made when inspecting your code stepping through line by line in 1st place."_. – πάντα ῥεῖ Aug 20 '16 at 14:57
  • Please make your comments better targeted next time @πάνταῥεῖ – sehe Aug 20 '16 at 14:59
  • I believe the code was SSCCE, even if there was a small bug where the parser was passed on by value where it should have been by reference. This part of the code was not central to the problem as its only purpose was to demonstrate that each parser worked well in isolation but not combined. This error escaped me as I did a last minute test on my home pc using MSVC on a new project that was incorrectly set-up. – user3721426 Aug 21 '16 at 22:54

2 Answers2

4

As I commented before, your program used auto-assignment of Proto expressions.

This means you copied temporaries, making the held references dangling.

That created UB.

You need boost::proto::deep_copy (or qi::copy in recent versions) to actually copy the expression template safely.

Here's a fixed version that works. Note the simplifications:

Live On Coliru

#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;
using parse_iter = std::string::const_iterator;

//  Rudimentary test of a boost::spirit qi parser
template<class Parser>
void check_parser(Parser const& p,std::string s)
{
    parse_iter first { s.begin() };
    parse_iter end { s.end() };
    bool ok = qi::phrase_parse(first,end,p,qi::space);
    std::cout << "Parsing [" << s << "]. Status: " 
        << (ok ? "OK\n": "Failed!\n");
}


//  Parser factory. Should have a static get() method
//  that returns something that can parse a valid T.
template<class T> struct parse_factory;

//  An example of using the factory
//  Specialising rule for int
template<>
struct parse_factory<int> 
{
    static qi::int_type get() { return qi::int_; } 
};


//  This ugly macro assures that we use the same rule for strings
#define PARSE_STR (qi::lexeme['"' >> *~qi::char_('"') >> '"']| +qi::alnum)

template<>
struct parse_factory<std::string>
{
    //static typename boost::proto::terminal<rule<parse_iter,std::string()>>::type get()
    static typename qi::rule<parse_iter,std::string()> get()
    {
        qi::rule<parse_iter,std::string()> res;
        debug(res);
        res = PARSE_STR;
        return res;
    }
};


    //  Testing union
//  In case you wonder about naming: the real code is related to ASN.1 
//  parsing where "CHOICE" corresponds to a boost::variant
    using my_choice = boost::variant<int,std::string>;


template<>
struct parse_factory<my_choice>
{
    static qi::rule<parse_iter,my_choice()> get()
    {
        qi::rule<parse_iter,my_choice()> r;
        r = qi::int_ | PARSE_STR;
        return r;
    }
};

void test_choice()
{
    auto choice_rule1 = qi::copy(qi::int_ | PARSE_STR);

    check_parser(choice_rule1,"1");         //  Prints OK
    check_parser(choice_rule1,"Hello1");     //  Prints OK


    auto choice_rule2 = qi::copy(parse_factory<int>::get() 
            | parse_factory<std::string>::get());

    check_parser(choice_rule2,"2");         //  Prints OK
    check_parser(choice_rule2,"Hello2");     //  Prints Failed! <=========================
    check_parser(parse_factory<std::string>::get(),"Hello");


    auto choice_rule3 = qi::copy(parse_factory<my_choice>::get());
    check_parser(choice_rule3,"3");         //  Prints OK
    check_parser(choice_rule3,"Hello3");     //  Prints OK


    auto choice_rule4 = qi::copy(qi::int_ | parse_factory<std::string>::get());

    check_parser(choice_rule4,"4");         //  Prints OK
    check_parser(choice_rule4,"Hello4");     //  Prints Failed! <=========================


    auto choice_rule5 = qi::copy(parse_factory<int>::get() | PARSE_STR);

    check_parser(choice_rule5,"5");         //  Prints OK
    check_parser(choice_rule5,"Hello5");     //  Prints OK
}

int main() 
{
    test_choice(); 
}

Prints

Parsing [1]. Status: OK
Parsing [Hello1]. Status: OK
Parsing [2]. Status: OK
Parsing [Hello2]. Status: OK
Parsing [Hello]. Status: OK
Parsing [3]. Status: OK
Parsing [Hello3]. Status: OK
Parsing [4]. Status: OK
Parsing [Hello4]. Status: OK
Parsing [5]. Status: OK
Parsing [Hello5]. Status: OK

And it runs clean under valgrind.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • First of all, thank you for your patience with me not understanding how to copy spirit::qi elements. I must admit that I still do not quite understand them. At http://coliru.stacked-crooked.com/a/ffdef061f3638d1a I expanded your example in order to be able to parse a vector of choices. This example fails with a core dump. Notice that I copy rules both in the factory for vector and in my local rule-creation (in the beginning of main). So what mistake did I make here? – user3721426 Aug 30 '16 at 22:49
  • To be frank, I have no clue. I think you're overcomplicating and I don't see justification. For fun, here's something that does work: http://coliru.stacked-crooked.com/a/0d6a71c4ef6c5e00 – sehe Aug 31 '16 at 01:40
  • @user3721426 As shown in your previous question, you can't return a rule by value you need to use `.copy()` and change the return type appropriately `proto::terminal::type`. [Example](http://melpon.org/wandbox/permlink/Vrl13ZRuCFavHVDg). – llonesmiz Aug 31 '16 at 05:38
  • @sehe To give a little bit of background, my application is related to air traffic control and the layout of my data is related to a specification that is carved in stone. This spec is full of types that are naturally represented as tuples variants, but since there are so many of them (there are choices with more than 100 alternatives), it would be a huge simplification if I could automate some of the text-input. So my idea is to have a templated userinput method and autogenerate e.g. the GUI. Here spirit could help if for the applicable fields, it could automatically generate the grammar. – user3721426 Aug 31 '16 at 08:44
  • @jv_ I believe I did copy rules in all my production codes. – user3721426 Aug 31 '16 at 08:45
  • 1
    @jv_ Quite frankly, trying to navigate all these pitfalls with `copy` and whatnot seems like building a house of cards in a tornado zone. (That's why I side stepped that hot mess). – sehe Aug 31 '16 at 09:13
  • @user3721426 Re. your application domain: that calls for code generation. The approach with dynamically compositing Qi grammars will likely not scale at all (have you tried variants with 100 element types?). I'd only consider such an approach with Spirit X3 (so using c++14 and variadics). Then again, if your grammar is "cast in stone", what is the gain of using an EDSL grammar? Just use an existing library. Prefer reliability. – sehe Aug 31 '16 at 09:21
  • @sehe I do not have a problem having 100 elements in my variant. It is actually capable of having far more (which is good since there is a tentative new version of the standard with variant with more than 500 elements). I do realise that these large What is cast in stone is the data format (e.g. the network layout of the variants). The textual representation of a field is also more or less fixed, but in other standards. As an example, one field-type is Speed.Speed is a variant, depending on the type of speed (e.g. relative to air or relative to ground) and on the units (knots or km/h).... – user3721426 Aug 31 '16 at 10:14
  • ... so I try to use qi to e.g. automatically create the parsers for speed based on the variant defining speed and similarly for other types (and there are loads of them), leaving me to define only the basic building blocks (e.g. speed in knots), for which many parsers (some integral values, for example) don't need a hand-written parser at all. Some words fell out in my prev. comment. I did realise that the huge variants are not suitable for automated use with qi. – user3721426 Aug 31 '16 at 10:20
1

I have found a solution to my problems, at least currently anything works so far. The core issue is to have the rule as a static member and return a const reference to that rule. So basically, what I do is to initialise a static rule in a static local memberfunction. This allows me to initialise the rule. My get() function for T then simply has the signature

static rule<Iter,T(),spacer> const& get() 
{ 
    rule<Iter,T(),spacer> const& t_rule = create_rule(); 
    return t_rule;
}

where create_rule creates the rule, stores it in a local static and returns a const reference to the static.

I would like to thank jv_ and sehe for their help in my long and twisted journey trying to make qi do what I want it to do.

user3721426
  • 273
  • 2
  • 8
  • I think my comment [here](http://stackoverflow.com/questions/39029434/spiritqi-combining-parser-factory-fails/39359447#comment65815997_39211690) still stands. Glad you took that idea but you seem to still be overcomplicating things (a symptom is that the code in your answers contributes in no way to the mechanics or the functionality anymore. All it does is glue the old interface on top of it. Why do you want that?) – sehe Sep 09 '16 at 10:06
  • I do this because the system I am programming against has literally hundreds (probably close to one thousand) of different types (types specified via the ASN.1 standard). These types are modeled in C++ into a much smaller set of generic templates (tuples, vectors, variants are some of those generics). I would go cray if I had to write a parser for each type. My aim is to automatically generate a parser for each type by having a parser-factory that automatically creates the parse rule for each type. So far it works for tuples, choices, floating-point types, integral types and ... – user3721426 Sep 09 '16 at 15:52
  • ... enumerations, including so far as I can tell combinations of e.g. choices of tuples composed of choices or tuples. So I believe that my approach is working quite well. I have tried to explain this before, so there is a chance that I have not understood your question? Missing is parsing of a few of the template-classes representing the domain, and I need to improve error handling, so that I instead of just reporting that input is invalid also can tell why (e.g. "this integral value must be larger than 60"). – user3721426 Sep 09 '16 at 16:06