3

Say I had two doubles separated by a comma to parse returning their sum. I might do it as follows in Haskell:

import Data.Attoparsec.Text
import Data.Text (pack)
dblParse = (\a -> fst a + snd a) <$> ((,) <$> double <* char ',' <*> double)
parseOnly dblParse $ pack "1,2"

The parseOnly statement will yield (Right 3)::Either String Double - where Either is how Haskell often handles errors.

You can kind of get a sense how this works - (,) <$> double <*> double yields a Parser (Double,Double), and applying (\a -> fst a + snd a) makes it a Parser Double.

I'm trying to do the same thing in Qi, but when I expect to get back 3, I actually get back 1:

namespace qi = boost::spirit::qi;
namespace ascii = boost::spirit::ascii;

namespace phx = boost::phoenix;

struct cat
{
    double q;
};

BOOST_FUSION_ADAPT_STRUCT(cat, q)
BOOST_FUSION_ADAPT_STRUCT(cat, q)
template <typename Iterator>
struct cat_parser : qi::grammar<Iterator, cat()>
{
    cat_parser() : cat_parser::base_type(start)
    {
        using qi::int_;
        using qi::double_;
        using qi::repeat;
        using qi::eoi;
        using qi::_1;
        double a;
        start %= double_[phx::ref(a) =_1] >> ',' >> double_[a + _1];
    }
    qi::rule<Iterator, cat()> start;
};

int main()
    {

        std::string wat("1,2");
        cat_parser<std::string::const_iterator> f;
        cat example;
        std::string::const_iterator st = wat.begin();
        std::string::const_iterator en = wat.end();
        std::cout << parse(st, en, f, example) << std::endl;
        std::cout << example.q << std::endl;
        return 0;
}

My question is twofold: Is this the idiomatic way to do this with Spirit, and why am I getting 1 instead of 3?

sehe
  • 374,641
  • 47
  • 450
  • 633
Carbon
  • 3,828
  • 3
  • 24
  • 51

1 Answers1

3

First the quick answer

why am I getting 1 instead of 3?

You're likely getting 1 because that's the exposed attribute.³

However, you can't reason about your code due to Undefined Behaviour.

Your semantic actions

  • invoke UB: you assign to a whose lifetime ends at the end of the parser constructor. That's random memory corruption

  • has no effect: the action [a+_1] is an expression that results in a temporary that is the sum of /whatever is at the memory location that used to hold the local variableaat the time of parser construction/ and the attribute exposed by the subject parser (double_). In this case it would be "?+2.0" but it doesn't matter at all because nothing is done with the result: it's merely discarded.

The normal answer

Taking the requirement to be Just:

Say I had two doubles separated by a comma to parse returning their sum

Here's how we'd do it:

double parseDoublesAndSum(std::istream& is) {
    double a, b; char comma;
    if (is >> a >> comma && comma == ',' && is >> b)
        return a + b;

    is.setstate(std::ios::failbit);
    return 0;
}

See it Live On Coliru.

Yeah, but using Spirit

I get it :)

Well, firstly, we'd realize the exposed attribute is a double, not the list.

Next step is to realize that the individual elements of the list aren't of interest. We can just initialize the result to 0 and use it to accumulate the elements¹, e.g.:

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

double parseDoublesAndSum(std::string const& source) {
    double result = 0;

    {
        using namespace boost::spirit::qi;
        namespace px = boost::phoenix;

        bool ok = parse(source.begin(), source.end(), double_ [ px::ref(result) += _1 ] % ',');
        if (!ok)
            throw std::invalid_argument("source: expect comma delimited list of doubles");
    }

    return result;
}

void test(std::string input) {
    try {
        std::cout << "'" << input << "' -> " << parseDoublesAndSum(input) << "\n";
    } catch (std::exception const& e) {
        std::cout << "'" << input << "' -> " << e.what() << "\n";
    }
}
int main() {
    test("1,2");
    test("1,2,3");
    test("1,2,3");
    test("1,2,inf,4");
    test("1,2,-inf,4,5,+inf");
    test("1,2,-NaN");
    test("1,,");
    test("1");
    test("aaa,1");
}

Prints

'1,2' -> 3
'1,2,3' -> 6
'1,2,3' -> 6
'1,2,inf,4' -> inf
'1,2,-inf,4,5,+inf' -> -nan
'1,2,-NaN' -> -nan
'1,,' -> 1
'1' -> 1
'aaa,1' -> 'aaa,1' -> source: expect comma delimited list of doubles

Advanced things:

  1. woah, "1,," shouldn't have parsed!

    It didn't :) We have formulated the parser not to expect the full input to be consumed, fix: append >> eoi:

    bool ok = parse(source.begin(), source.end(), double_ [ px::ref(result) += _1 ] % ',' >> eoi);
    

    Now the relevant test case prints

    '1,,' -> '1,,' -> source: expect comma delimited list of doubles
    

    What if we want the diagnostic to mention that the end of input (eoi) was expected? Make it an expectation point > eoi:

    bool ok = parse(source.begin(), source.end(), double_ [ px::ref(result) += _1 ] % ',' > eoi);
    

    Now prints

    '1,,' -> '1,,' -> boost::spirit::qi::expectation_failure
    

    Which can be improved by handling that exception type:

    Live On Coliru

    Prints

    '1,,' -> Expecting <eoi> at ',,'
    
  2. How about accepting spaces?

    Just use phrase_parse which allows a skipper outside lexemes.²:

    bool ok = phrase_parse(source.begin(), source.end(), double_ [ px::ref(result) += _1 ] % ',' > eoi, blank);
    

    Now everything blank is ignored in between the primitives:

    test("   1, 2   ");
    

    Prints

    '   1, 2   ' -> 3
    
  3. How to package it up as rule?

    Like I mentioned, realize you can use the rule's exposed attribute as accumulator register:

    namespace Parsers {
        static const qi::rule<iterator, double(), qi::blank_type> product
            = qi::eps [ qi::_val = 0 ] // initialize
            >> qi::double_ [ qi::_val += qi::_1 ] % ','
            ;
    }
    

    Live On Coliru

    Printing the same results as before


¹ bear in mind that summation is an interesting subject, http://www.partow.net/programming/sumtk/index.html

² primitive parsers are implicitly lexemes, lexeme[] directives inhibit skipping and rules declared without a skipper are implicitly lexemes: Boost spirit skipper issues

³ PS. There's a subtlety at play here. Had you not written %= but just = the value would have been indeterminate: http://www.boost.org/doc/libs/1_65_1/libs/spirit/doc/html/spirit/qi/reference/nonterminal/rule.html#spirit.qi.reference.nonterminal.rule.expression_semantics

sehe
  • 374,641
  • 47
  • 450
  • 633
  • And `qi::double_ [ qi::_val += qi::_1 ] % ','` interprets as a list of doubles, separated by commas, correct? – Carbon Nov 29 '17 at 21:04
  • 1
    Simplified by dropping the semantic action: [`double_ % ','`](http://www.boost.org/doc/libs/1_65_1/libs/spirit/doc/html/spirit/qi/reference/operator/list.html) does, yes. – sehe Nov 29 '17 at 21:05
  • For fun: separating parsing from summing makes _everything_ a lot simpler: **[Live On Coliru](http://coliru.stacked-crooked.com/a/e5d33ef9d196be20)**. I think you should do this, for small/bounded lists. (PSA: Coliru cannot usually compile these samples, I have backdoor access) – sehe Nov 29 '17 at 21:10
  • 1
    Yep, the point of this was to figure out how I can make complicated rules, as needed. I think I understand the val notation. I really like Spirit (to the extent one can like being that deep in the preprocessor) so far. – Carbon Nov 29 '17 at 21:12
  • I like Spirit, because (a) it enables rapid prototyping (b) it doesn't involve the preprocessor :) You obviously meant generic template instantiations. – sehe Nov 29 '17 at 21:13
  • Oh - I just spent a bit of time fighting through the `boost::Fusion` bits for structs with too many elements to go through BOOST_FUSION_ADAPT_STRUCT - getting my head around that whole thing was interesting... I guess there isn't any preprocessor stuff in Spirit, it's an optional feature. – Carbon Nov 29 '17 at 21:17
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/160140/discussion-between-sehe-and-carbon). – sehe Nov 29 '17 at 21:17