2

I'm writing a Qi parser to parse IRC messages, transcribing RFC 2812. Among the grammar is a completely mundate alternative :

auto const hostname = shortname >> *('.' >> shortname);
auto const nickUserHost = nickname >> -(-('!' >> user) >> '@' >> host);

auto const prefix = hostname | nickUserHost;

(Full code on Coliru here)

I'm baffled to see that my test string ("D-z!D-z@mib-A3A026FF.rev.sfr.net") matches nickUserHost, but not prefix.

The only remarkable thing that I see is that nickUserHost's host is itself defined in terms of hostname, but I'm not sure how it would affect the parsing in any way.

Community
  • 1
  • 1
Quentin
  • 62,093
  • 7
  • 131
  • 191

1 Answers1

2

By appending >> eoi you explicitly make the parse failed if it didn't reach the end of the input.

Live On Coliru

#include <string>
#include <iostream>
#include <iomanip>

#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

template <typename Expr>
void test(std::string name, Expr const& expr) {
    std::string const test = "D-z!D-z@mib-A3A026FF.rev.sfr.net";

    auto f = begin(test);
    bool ok = qi::parse(f, end(test), expr);
    std::cout << name << ": " << ok << "\n";
    if (f != end(test))
        std::cout << " -- remaining input: '" << std::string(f, end(test)) << "'\n";
}

int main() {
    auto const hexdigit = qi::char_("0123456789ABCDEF");
    auto const special = qi::char_("\x5b-\x60\x7b-\x7d");

    auto const oneToThreeDigits = qi::repeat(1, 3)[qi::digit];
    auto const ip4addr = oneToThreeDigits >> '.' >> oneToThreeDigits >> '.' >> oneToThreeDigits >> '.' >> oneToThreeDigits;
    auto const ip6addr = +(hexdigit >> qi::repeat(7)[':' >> +hexdigit]) | ("0:0:0:0:0:" >> (qi::lit('0') | "FFFF") >> ':' >> ip4addr);
    auto const hostaddr = ip4addr | ip6addr;

    auto const nickname = (qi::alpha | special) >> qi::repeat(0, 8)[qi::alnum | special | '-'];
    auto const user = +(~qi::char_("\x0d\x0a\x20\x40"));

    auto const shortname = qi::alnum >> *(qi::alnum | '-');
    auto const hostname = shortname >> *('.' >> shortname);
    auto const host = hostname | hostaddr;

    auto const nickUserHost = nickname >> -(-('!' >> user) >> '@' >> host);

    auto const prefix = hostname | nickUserHost; // The problematic alternative

    std::cout << std::boolalpha;
    test("hostname",     hostname);
    test("nickUserHost", nickUserHost);
    test("prefix",       prefix);
}

Prints

hostname: true
-- remaining input: '!D-z@mib-A3A026FF.rev.sfr.net'
nickUserHost: true
prefix: true
-- remaining input: '!D-z@mib-A3A026FF.rev.sfr.net'
Tomilov Anatoliy
  • 15,657
  • 10
  • 64
  • 169
sehe
  • 374,641
  • 47
  • 450
  • 633
  • This is on purpose, because I expect it to match the whole input. Does that actually interfere with `|`'s behaviour ? – Quentin Jan 17 '16 at 14:33
  • @Quentin no, but `hostname` succeeds without consuming the whole input (and accordingly `nickUserHost` is never tried) and then the `eoi` makes your parse fail. You need to put `eoi` in each of the branches of `|`. – llonesmiz Jan 17 '16 at 14:37
  • 1
    Erm. I just missed you're not using Spirit X3. **`auto` is not suitable for parser expressions`** See https://stackoverflow.com/questions/26410498/undefined-behaviour-somewhere-in-boostspiritqiphrase-parse/26411266#26411266 – sehe Jan 17 '16 at 14:37
  • @sehe True, although in [this case](http://coliru.stacked-crooked.com/a/65d9ed6917f493ea) this doesn't change anything. – llonesmiz Jan 17 '16 at 14:39
  • @cv_and_he It does. GCC crashes on it: http://coliru.stacked-crooked.com/a/135e931117b21948 (UB is UB). Quentin: see the code supplied by cv_and_he – sehe Jan 17 '16 at 14:42
  • @sehe Bad choice of words. I meant that removing the `auto`s presented the same result. PS: [This](http://coliru.stacked-crooked.com/a/f548d3ddae43f3be) seems really evil, but it works (right?). – llonesmiz Jan 17 '16 at 14:44
  • It's less evil if you make it a proper assert `&eoi`. Or maybe the OP can reorder the branches `nickUserHost | hostname` – sehe Jan 17 '16 at 14:50
  • That is a whole bunch of useful information, thank you vey much ! This is a just-begun toy project to discover Spirit, so I think I will switch to X3 now that I know it exists. About the `|` issue, is simply switching both sides of the `|`, so that the most "complex" one comes first, the right approach ? – Quentin Jan 17 '16 at 14:51
  • Depends on the grammar. I think you're doing IRC there, so rings true as far as I remember – sehe Jan 17 '16 at 14:51
  • @sehe I think you may have missed the `namespace qi=boost::spirit::x3`. But you are totally right `&qi::eoi` makes more sense. – llonesmiz Jan 17 '16 at 14:59
  • Lol. That's devious. – sehe Jan 17 '16 at 15:00