How to achieve proper parsing when some of structure' fields are omitted or are in not the same order as in the structure declaration?

Question

So I have a parser that parses string like 7.5*[someAlphanumStr] or 7.5[someAlphanumStr] into this struct:

struct summand {
    float factor;
    std::string name;
    summand(const float & f):factor(f), name(""){}
    summand(const std::string & n):factor(1.0f), name(n){}
    summand(const float & f, const std::string & n):factor(f), name(n){}
    summand():factor(0.0f), name(""){}
};

but in addition i need to be able parse strings like [someAlphanumStr]*7.4, [someAlphanumStr]5, 7.4 and [someAlphanumStr]. In the last two cases(7.4 and [someAlphanumStr]) i want to set values for fields which are omitted into default values and for this sake i have written for my struct summand constructors with one argument.

Below is my code and result which it produces:

#include <boost/config/warning_disable.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/fusion/include/io.hpp>

#include <iostream>
#include <string>
#include <vector>

namespace client
{
    namespace spirit = boost::spirit;
    namespace qi     = boost::spirit::qi;
    namespace ascii  = boost::spirit::ascii;

    struct summand {
        float factor;
        std::string name;
        summand(const float & f):factor(f), name(""){}
        summand(const std::string & n):factor(1.0f), name(n){}
        summand(const float & f, const std::string & n):factor(f), name(n){}
        summand():factor(0.0f), name(""){}
    };
}

BOOST_FUSION_ADAPT_STRUCT(client::summand,
                      (float, factor)
                      (std::string, name)
                      )

namespace client {

    template <typename Iterator>
    struct summand_parser : qi::grammar<Iterator, summand(), ascii::space_type>
    {
        summand_parser() : summand_parser::base_type(summand_rule)
        {
            using namespace ascii;

            summand_rule %= (qi::float_ >> -qi::lit('*') >> '[' >> qi::lexeme[alpha >> *alnum] >> ']')|('[' >> qi::lexeme[alpha >> *alnum] >> ']' >> -qi::lit('*') >> qi::float_)|(qi::float_)|('[' >> qi::lexeme[alpha >> *alnum] >> ']');

        }

        qi::rule<Iterator, summand(), ascii::space_type> summand_rule;
    };
}

void parseSummandsInto(std::string const& str, client::summand& summands)
{
    typedef std::string::const_iterator It;
    static const client::summand_parser<It> g;

    It iter = str.begin(),
    end = str.end();

    bool r = phrase_parse(iter, end, g, boost::spirit::ascii::space, summands);

    if (r && iter == end)
        return;
    else
        throw "Parse failed";
}

int main()
{
    std::vector<std::string> inputStrings = {"7.5*[someAlphanumStr]", "7.5[someAlphanumStr]", "[someAlphanumStr]*7.4", "[someAlphanumStr]5", "7.4", "[someAlphanumStr]"};

    std::for_each(inputStrings.begin(), inputStrings.end(), [&inputStrings](std::string & inputStr) {
        client::summand parsed;
        parseSummandsInto(inputStr, parsed);
        std::cout << inputStr << " -> " << boost::fusion::as_vector(parsed) << std::endl;
    });
}

results (Coliru):

+ clang++ -std=c++11 -O0 -Wall -pedantic main.cpp
+ ./a.out
+ c++filt -t
7.5*[someAlphanumStr] -> (7.5 someAlphanumStr)
7.5[someAlphanumStr] -> (7.5 someAlphanumStr)
[someAlphanumStr]*7.4 -> (115 )
[someAlphanumStr]5 -> (115 )
7.4 -> (7.4 )
[someAlphanumStr] -> (115 omeAlphanumStr)

Thanks to all for clear answers and advices and especially I'm grateful to @sehe.

Couldn't you just provide multiple rules, apply them in turn, and use whichever gives the "best" result? — benjymous, Oct 29 '13 at 16:27
As a general hint, make it a habit to keep your grammar **clean** and simple. Don't live with repeated sub expressions. State the expected attribute types. (In my answer I hope to show you what the result would look like.) Really, programming is hard enough without the added burden of having to make sense of a grammar like that — sehe, Oct 29 '13 at 22:04
Interestingly, my solution, which I think is quite readable, uses 25 LoC (compared to 11 LoC in your sample) for the grammar definition, yet it comes in at the same number of lines overall while adding BOOST_SPIRIT_DEBUG and error handling to the main program. — sehe, Oct 29 '13 at 22:06
Note also: I'd always suggest using `qi::double_` to parse even into a `float` due to accuracy issues described [here](http://stackoverflow.com/q/17391348/85371) and the spirit-general mailing list (my answer includes this workaround) — sehe, Oct 29 '13 at 22:29

sehe · Accepted Answer · 2013-10-29T22:21:08.063

The way to get anything done with Spirit^[1] is to use small steps, simplify rigorously along the way.

Don't live with "cruft" (like, randomly repeated sub expressions). Also, being explicit is good. In this case, I'd start with extracting the repeated sub-expressions and reformatting for legibility:

    name_rule   = '[' >> qi::lexeme[alpha >> *alnum] >> ']';
    factor_rule = qi::float_;

    summand_rule %= 
          (factor_rule >> -qi::lit('*') >> name_rule)
        | (name_rule   >> -qi::lit('*') >> factor_rule)
        | (factor_rule)
        | (name_rule)
        ;

There, much better already, and I haven't changed a thing. But wait! It doesn't compile anymore

    qi::rule<Iterator, std::string(), ascii::space_type> name_rule;
    qi::rule<Iterator, float(),       ascii::space_type> factor_rule;

It turns out that the grammar only "happened" to compile because Spirit's Attribute compatibility rules are so lax/permissive that the characters matched for the name were just being assigned to the factor part (that's where 115 came from: 0x73 is ASCII for s from someAlphanumStr).

OOPS/TL;DW I had quite a lenghty analysis write up here, once, but I clobbered it by closing my browser and SO had only an old draft cached server-side :( I'll boil it down to the bottomline now:

Guideline Use either constructor overloads to assign to your exposed attribute type, or use Fusion Sequence adaptation, but don't mix the two: they will interfere in surprising/annoying ways.

Don't worry, I won't let you go empty handed, of course. I'd just 'manually' direct the factor and name components in their respective 'slots' (members)^[2].

Inherited attributes are a sweet way to have keep this legible and convenient:

// assuming the above rules redefined to take ("inherit") a summand& attribute:
qi::rule<Iterator, void(summand&), ascii::space_type> name_rule, factor_rule;

Just add a simple assignment in the semantic action:

name_rule   = as_string [ '[' >> lexeme[alpha >> *alnum] >> ']' ] 
                        [ _name   = _1 ];
factor_rule = double_   [ _factor = _1 ];

Now, the 'magic dust' is of course in how the _name and _factor actors are defined. I prefer using binds for this, over phx::at_c<N> due to maintenance costs:

static const auto _factor = phx::bind(&summand::factor, qi::_r1);
static const auto _name   = phx::bind(&summand::name,   qi::_r1);

See? That's pretty succinct and clearly shows what is happening. Also, there's no actual need to have Fusion adaptation for summand here.

Now, finally, we can simplify the main rule as well:

    summand_rule = 
              factor_rule (_val) >> - ( -lit('*') >> name_rule   (_val) )
            | name_rule   (_val) >> - ( -lit('*') >> factor_rule (_val) )
        ;

What this does, is simply combine the single-component branches into the dual-component branches by making the trailing part optional.

Note how the summand default constructor takes care of the default values:

struct summand {
    float factor;
    std::string name;

    summand() : factor(1.f), name("") {}
};

Notice how this removed quite some complexity there.

See the fully adapted sample running Live on Coliru which prints:

7.5*[someAlphanumStr] -> (7.5 someAlphanumStr)
7.5[someAlphanumStr] -> (7.5 someAlphanumStr)
[someAlphanumStr]*7.4 -> (7.4 someAlphanumStr)
[someAlphanumStr]5 -> (5 someAlphanumStr)
7.4 -> (7.4 )
[someAlphanumStr] -> (1 someAlphanumStr)

Full Code Listing

#define BOOST_SPIRIT_USE_PHOENIX_V3
//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

namespace client {
    namespace qi     = boost::spirit::qi;
    namespace phx    = boost::phoenix;
    namespace ascii  = boost::spirit::ascii;

    struct summand {
        float factor;
        std::string name;

        summand() : factor(1.f), name("") {}
    };
}

namespace client {

    template <typename Iterator>
    struct summand_parser : qi::grammar<Iterator, summand(), ascii::space_type>
    {
        summand_parser() : summand_parser::base_type(summand_rule)
        {
            using namespace ascii;

            static const auto _factor = phx::bind(&summand::factor, qi::_r1);
            static const auto _name   = phx::bind(&summand::name,   qi::_r1);

            name_rule   = qi::as_string [ '[' >> qi::lexeme[alpha >> *alnum] >> ']' ] 
                                          [ _name   = qi::_1 ] ;
            factor_rule = qi::double_     [ _factor = qi::_1 ] ;

            summand_rule = 
                      factor_rule (qi::_val) >> - ( -qi::lit('*') >> name_rule   (qi::_val) )
                    | name_rule   (qi::_val) >> - ( -qi::lit('*') >> factor_rule (qi::_val) )
                ;

            BOOST_SPIRIT_DEBUG_NODES((summand_rule)(name_rule)(factor_rule))
        }

        qi::rule<Iterator, void(summand&), ascii::space_type> name_rule, factor_rule;
        qi::rule<Iterator, summand(),      ascii::space_type> summand_rule;
    };
}

bool parseSummandsInto(std::string const& str, client::summand& summand)
{
    typedef std::string::const_iterator It;
    static const client::summand_parser<It> g;

    It iter(str.begin()), end(str.end());
    bool r = phrase_parse(iter, end, g, boost::spirit::ascii::space, summand);

    return (r && iter == end);
}

int main()
{
    std::vector<std::string> inputStrings = {
        "7.5*[someAlphanumStr]",
        "7.5[someAlphanumStr]",
        "[someAlphanumStr]*7.4",
        "[someAlphanumStr]5",
        "7.4",
        "[someAlphanumStr]",
    };

    std::for_each(inputStrings.begin(), inputStrings.end(), [&inputStrings](std::string const& inputStr) {
        client::summand parsed;
        if (parseSummandsInto(inputStr, parsed))
            std::cout << inputStr << " -> (" << parsed.factor << " " << parsed.name << ")\n";
        else
            std::cout << inputStr << " -> FAILED\n";
    });
}

^[1] And arguably, anything else in technology

^[2] You can keep the FUSION_ADAPT_STRUCT but it's no longer required as you can see

This one really is a good write-up (I was planning on extending mine, before your really nice comment), and very nice use of `_factor` and `_name`. — llonesmiz, Oct 29 '13 at 22:12
@cv_and_he I've grown quite fond of aliasing the lazy placeholders or actor expressions like that. It really can make the cruftiness palatable, in my taste. I mean, the grammar definition should be "user-servicable", it's not buried deep inside the cogs of some TMP library: it's the "high-level" entry point and should be readable. — sehe, Oct 29 '13 at 22:15

score 2 · Answer 2 · answered Oct 29 '13 at 18:13

2

I'm not sure whether this is the best solution, but I'd solve this by providing initial values for the fusion sequence and than modifying them later with Phoenix:

summand_rule %=
    (qi::float_ >> -(-qi::lit('*') >> '[' >> qi::lexeme[alpha >> *alnum] >> ']'))
  | (qi::attr(0.) >> '[' >> qi::lexeme[alpha >> *alnum] >> ']' >> -(-qi::lit('*') >> qi::float_[ph::at_c<0>(qi::_val) = qi::_1]));

That is, we're giving an initial value of 0. to the first item in the fusion sequence, which gets assigned to factor, and then going back and modifying it later.

If we omit the factor in the reversed case, the attribute type of the rule will exactly model summand and we can use = assignment instead of %=:

summand_rule =
    (qi::float_ >> -(-qi::lit('*') >> '[' >> qi::lexeme[alpha >> *alnum] >> ']'))
  | (qi::attr(0.) >> '[' >> qi::lexeme[alpha >> *alnum] >> ']' >> -(-qi::lit('*') >> qi::omit[qi::float_[ph::at_c<0>(qi::_val) = qi::_1]]));

Demo: http://coliru.stacked-crooked.com/a/46e3e8101a9c10ea

answered Oct 29 '13 at 18:13

ecatmur

152,476
27
293
366

You ended up suggesting basically what I did, minus the style advice. Really, I'm not convinced that mixing more `qi::omit` and `phx::at_c<1>` (wat?) in that _single-line_ grammar of 235 columns [_sic_] was really what the doctor ordered. – sehe Oct 29 '13 at 22:13
Thank you for you replay but in my xCode project it leads to compile time error: `Recursive template instantiation exceeded maximum depth of 128`. After googling i have found hint to add a line to the port file: configure.cxxflags-append -ftemplate-depth-1024 but i don't know where this file is. – Mikhail Zinkovsky Oct 30 '13 at 00:37
@sehe yeah, I ran myself into a corner by trying to keep the fusion sequence adaptation. Inherited attribute rules are definitely the better option, and yours is a great write up - it'd be great if the docs covered this. – ecatmur Oct 30 '13 at 10:08
@ecatmur thanks :) I think the documentation covers [Inherited Attributes](http://www.boost.org/doc/libs/1_54_0/libs/spirit/doc/html/spirit/qi/reference/parser_concepts/nonterminal.html#spirit.qi.reference.parser_concepts.nonterminal.attributes) and also showcases them in the [Mini-XML tutorial](http://www.boost.org/doc/libs/1_54_0/libs/spirit/doc/html/spirit/qi/tutorials/mini_xml___asts_.html) and [an example](http://www.boost.org/doc/libs/1_54_0/libs/spirit/example/qi/unescaped_string.cpp). Now, using them creatively to pass a reference parent rule's attribute was the "novelty". – sehe Oct 30 '13 at 13:31
(_Note that "ditching" ADAPT_STRUCT was merely and accidental side-effect of my dislike of `at_c` for known structs._) – sehe Oct 30 '13 at 13:35

How to achieve proper parsing when some of structure' fields are omitted or are in not the same order as in the structure declaration?

2 Answers2

Full Code Listing