2

I'm generally familiar with using qi::attr to implement a "default value" for a missing entry in parsed input. But I haven't seen how to do this when the default value needs to be pulled from an earlier parse.

I'm trying to parse into the following struct:

struct record_struct {

    std::string Name;
    uint8_t Distance;
    uint8_t TravelDistance;
    std::string Comment;
};

From a relatively simple "(text) (number) [(number)] [//comment]" format, where both the second number and the comment are optional. If the second number is not present, it's value should be set to the same as the first number.

What follows is a cut down example of working code that doesn't QUITE do what I want. This version just defaults to 0 rather than the correct value. If possible, I'd like to isolate the parsing of the two integers to a separate parser rule, without giving up using the fusion struct.

Things I've tried that haven't compiled:

  • Replacing qi::attr(0) with qi::attr(qi::_2)
  • Trying to modify after the fact on an attr match with a semantic action `qi::attr(0)[qi::_3 = qi::_2]

The full test code:

#include <string>
#include <cstdint>
#include <boost/spirit/include/qi.hpp>

struct record_struct {

    std::string Name;
    uint8_t Distance;
    uint8_t TravelDistance;
    std::string Comment;
};

BOOST_FUSION_ADAPT_STRUCT(
    record_struct,
    (std::string, Name)
    (uint8_t, Distance)
    (uint8_t, TravelDistance)
    (std::string, Comment)
)

std::ostream &operator<<(std::ostream &o, const record_struct &s) {
    o << s.Name << " (" << +s.Distance << ":" << +s.TravelDistance << ") " << s.Comment;
    return o;
}

bool test(std::string s) {
    std::string::const_iterator iter = s.begin();
    std::string::const_iterator end = s.end();
    record_struct result;
    namespace qi = boost::spirit::qi;
    bool parsed = boost::spirit::qi::parse(iter, end, (
                    +(qi::alnum | '_') >> qi::omit[+qi::space]
                    >> qi::uint_ >> ((qi::omit[+qi::space] >> qi::uint_) | qi::attr(0))
                    >> ((qi::omit[+qi::space] >> "//" >> +qi::char_) | qi::attr(""))
                ), result);
    if (parsed) std::cout << "Parsed: " << result << "\n";
    else std::cout << "Failed: " << std::string(iter, end) << "\n";
    return parsed;
}

int main(int argc, char **argv) {

    if (!test("Milan 20 22")) return 1;
    if (!test("Paris 8 9 // comment")) return 1;
    if (!test("London 5")) return 1;
    if (!test("Rome 1 //not a real comment")) return 1;
    return 0;
}

Output:

Parsed: Milan (20:22)
Parsed: Paris (8:9)  comment
Parsed: London (5:0)
Parsed: Rome (1:0) not a real comment

Output I want to see:

Parsed: Milan (20:22)
Parsed: Paris (8:9)  comment
Parsed: London (5:5)
Parsed: Rome (1:1) not a real comment
jkerian
  • 16,497
  • 3
  • 46
  • 59

1 Answers1

2

First of all, instead of spelling out omit[+space], just use a skipper:

bool parsed = qi::phrase_parse(iter, end, (
                   qi::lexeme[+(alnum | '_')]
                >> uint_ >> (uint_ | attr(0))
                >> (("//" >> lexeme[+qi::char_]) | attr(""))
            ), qi::space, result);

Here, qi::space is the skipper. lexeme[] avoids skipping inside the sub-expression (see Boost spirit skipper issues).

Next up, you can do it more than one way.

  1. use a local attribute to temporarily store a value:

    Live On Coliru

    rule<It, record_struct(), locals<uint8_t>, space_type> g;
    
    g %= lexeme[+(alnum | '_')]
         >> uint_ [_a = _1] >> (uint_ | attr(_a))
         >> -("//" >> lexeme[+char_]);
    
    parsed = phrase_parse(iter, end, g, space, result);
    

    This requires

    • a qi::rule declaration to declare the qi::locals<uint8_t>; qi::_a is the placeholder for that local attribute
    • initialize the rule as an "auto-rule" (docs), i.e. with %= so that semantic actions do not overrule attribute propagation
  2. There's a wacky hybrid here where you don't actually use locals<> but just refer to an external variable; this is in general a bad idea but as your parser is not recursive/reentrant you could do it

    Live On Coliru

    parsed = phrase_parse(iter, end, (
                   lexeme[+(alnum | '_')]
                >> uint_ [ phx::ref(dist_) = _1 ] >> (uint_ | attr(phx::ref(dist_)))
                >> (("//" >> lexeme[+char_]) | attr(""))
            ), space, result);
    
  3. You could go full Boost Phoenix and juggle the values right from the semantic actions

    Live On Coliru

    parsed = phrase_parse(iter, end, (
                   lexeme[+(alnum | '_')]
                >> uint_ >> (uint_ | attr(phx::at_c<1>(_val)))
                >> (("//" >> lexeme[+char_]) | attr(""))
            ), space, result);
    
  4. You could parse into optional<uint8_t> and postprocess the information

    Live On Coliru

    std::string              name;
    uint8_t                  distance;
    boost::optional<uint8_t> travelDistance;
    std::string              comment;
    
    parsed = phrase_parse(iter, end, (
                   lexeme[+(alnum | '_')]
                >> uint_ >> -uint_
                >> -("//" >> lexeme[+char_])
            ), space, name, distance, travelDistance, comment);
    
    result = { name, distance, travelDistance? *travelDistance : distance, comment };
    

Post Scriptum

I noticed this a little late:

If possible, I'd like to isolate the parsing of the two integers to a separate parser rule, without giving up using the fusion struct.

Well, of course you can:

rule<It, uint8_t(uint8_t)> def_uint8 = uint_parser<uint8_t>() | attr(_r1);

This is at once more accurate, because it doesn't parse unsigned values that don't fit in a uint8_t. Mixing and matching from the above: Live On Coliru

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks (again) sehe... the grammar I'm working with here has syntactically significant whitespace in a few other places, although this particular rule can probably switch to a space-skipper. – jkerian Dec 29 '14 at 22:03
  • @sehe About your Post Scriptum, I think jkerian means something like [this](http://coliru.stacked-crooked.com/a/8e26cffe810fd12f) (although preferably with an approach that didn't modify the struct). – llonesmiz Dec 30 '14 at 14:25
  • @cv_and_he Thanks for that make-it-even-completer addition. I realize that but I don't think there are - effective - ways to do what he describes, so I'd settle for this. I personally think all my samples are pretty viable, although I'd prefer 1 (and depending on the application, 4) – sehe Dec 30 '14 at 14:51