Boost spirit skip parser with at least one whitespace

Question

In the grammar i'm implementing, there are elements separated by whitespace. With a skip parser, the spaces between the elements are skipped automatically, but this also allows no space, which is not what i want. Sure, i could explicitly write a grammar that includes these spaces, but it seems to me (with the complexity and flexibility offered by spirit) that there is a better way to do this. Is there? Here is an example:

#include <cstdlib>
#include <iostream>
#include <string>

#include <boost/spirit/include/qi.hpp>    

namespace qi = boost::spirit::qi;

int main(int argc, char** argv)
{
    if(argc != 2)
    {
        std::exit(1);
    }
    std::string str = argv[1];
    auto iter = str.begin();
    bool r = qi::phrase_parse(iter, str.end(), qi::char_ >> qi::char_, qi::blank);

    if (r && iter == str.end())
    {
        std::cout << "parse succeeded\n";
    }
    else
    {
        std::cout << "parse failed. Remaining unparsed: " << std::string(iter, str.end()) << '\n';
    }
}

This allows ab as well as a b. I want only the latter to be allowed.

Related to this: How do the skip parsers work, exactly? One supplies something like qi::blank, is then the kleene star applied to form the skip parser? I would like to get some enlightenment here, maybe this also helps on solving this problem.

Additional information: My real parser looks something like this:

one   = char_("X") >> repeat(2)[omit[+blank] >> +alnum] >> qi::omit[+qi::blank] >> +alnum;
two   = char_("Y") >> repeat(3)[omit[+blank] >> +alnum];
three = char_("Z") >> repeat(4)[omit[+blank] >> +alnum] >> qi::omit[+qi::blank] >> +alnum;

main = one | two | three;

which makes the grammar quite noisy, which i would like to avoid.

The problem in your example is that `char_ ` matches any character (including white spaces). You can fix the example by using `~blank` or `~char_(' ')`. Your "real" parsers (with removed `omit[+blank]` things) do not have this problem and should run fine with `blank` skipper if you use `lexeme` properly. — Nikita Kniazev, Nov 02 '18 at 11:08

sehe · Accepted Answer · 2018-11-02T00:03:15.377

First off, the grammar specs I usually see this kind of requirement in are (always?) RFCs. In 99% of cases there is no issue, consider e.g.:

 myrule = skip(space) [ uint_ >> uint_ ];

This already implicitly requires at least 1 whitespace character between the numbers, for the simple reason that there would be 1 number, otherwise. The same simplification occurs in surprisingly many cases (see e.g. the simplifications made around the ubiquitous WSP productions in this answer last week Boost.Spirit qi value sequence vector).

With that out of the way, skippers apply zero or more times, by definition, so no there is not a way to get what you want with an existing stateful directive like skip(). See also http://stackoverflow.com/questions/17072987/boost-spirit-skipper-issues/17073965#17073965 or the docs - under lexeme, [no_]skip and skip_flag::dont_postskip).

Looking at your specific grammar, I'd do this:

bool r = qi::phrase_parse(iter, end, token >> token, qi::blank);

Here, you can add a negative lookahead assertion inside a lexeme to assert that "the end of the token was reached" - which in your parser would be mandated as !qi::graph:

    auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);

See a demo:

Live On Coliru

#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

int main() {
    for (std::string const str : { "ab", " ab ", " a b ", "a b" }) {
        auto iter = str.begin(), end = str.end();

        auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);

        bool r = qi::phrase_parse(iter, end, token >> token, qi::blank);

        std::cout << " --- " << std::quoted(str) << " --- ";
        if (r) {
            std::cout << "parse succeeded.";
        } else {
            std::cout << "parse failed.";
        }

        if (iter != end) {
            std::cout << " Remaining unparsed: " << std::string(iter, str.end());
        }

        std::cout << std::endl;
    }
}

Prints

 --- "ab" --- parse failed. Remaining unparsed: ab
 --- " ab " --- parse failed. Remaining unparsed:  ab 
 --- " a b " --- parse succeeded.
 --- "a b" --- parse succeeded.

BONUS Review notes

My guidelines would be:

your skipper should be the grammar's responsibility. It's sad that all Qi samples lead people to believe you need to let the caller decide that
end-iterator checking does not equal error-checking. It's very possible to parse things correctly without consuming all input. Which is why reporting the "remaining input" should not just happen in the case that parsing failed.
If trailing unparsed input is an error, spell it out:

Live On Coliru

#include <iostream>
#include <iomanip>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

int main() {
    for (std::string const str : { "ab", " ab ", " a b ", "a b happy trees are trailing" }) {
        auto iter = str.begin(), end = str.end();

        auto token = qi::copy(qi::lexeme [ qi::char_ >> !qi::graph ]);

        bool r = qi::parse(iter, end, qi::skip(qi::space) [ token >> token >> qi::eoi ]);

        std::cout << " --- " << std::quoted(str) << " --- ";
        if (r) {
            std::cout << "parse succeeded.";
        } else {
            std::cout << "parse failed.";
        }

        if (iter != end) {
            std::cout << " Remaining unparsed: " << std::quoted(std::string(iter, str.end()));
        }

        std::cout << std::endl;
    }
}

Prints

 --- "ab" --- parse failed. Remaining unparsed: "ab"
 --- " ab " --- parse failed. Remaining unparsed: " ab "
 --- " a b " --- parse succeeded.
 --- "a b happy trees are trailing" --- parse failed. Remaining unparsed: "a b happy trees are trailing"

Phew, you give out a lot to chew ;) Thanks for the detailed answer, as well as the information on more general topics. Considering your first point: But this only works when you are not using something like `+digit >> +digit`, doesn't it? Because in this case, there is no way in figuring out which digit belongs where. Using a skip parser for spaces, `123 456` is the same as `123456`, the same as `12 3456` etc (if i understand correctly). — pschulz, Nov 01 '18 at 17:41
Considering your solution: What is `qi::copy` for? I haven't seen it before and a quick looking-through-the-documentation brought nothing up. — pschulz, Nov 01 '18 at 17:42
@pschulz See https://stackoverflow.com/questions/53033501/boost-spirit-qi-crashes-for-memory-violation/53035940#53035940 and the many links in it :) — sehe, Nov 01 '18 at 17:43
Bonus points: 1. You are right, i will change that. 2. Yep, only a test case, since the parser is currently far from working well ;) 3. Will do :) — pschulz, Nov 01 '18 at 17:44
Btw: what god granted you this unlimited spirit knowledge? Do you belong to the development team? — pschulz, Nov 01 '18 at 17:45
Thanks for the link, so if i'm not using `auto`, i don't need `qi::copy`? — pschulz, Nov 01 '18 at 17:52
Upvoted for my agreement. I just like Qi and and happen to focus on Boost questions on StackOverflow. This brings a lot of accidental experience. — sehe, Nov 01 '18 at 17:58
I just noticed I missed the question in your first comment. Yes, you understand correctly. Indeed `+digit >> +digit` is different because `+digit` is not a lexeme, whereas `qi::uint_` is - implicitly. The following would be fine: `lexeme[+digit] >> lexeme[+digit]` - making sure that no spaces are skipped inside the lexemes. — sehe, Nov 02 '18 at 00:02
Thank you for your comments, this helped me a lot. You are using an explicit skip and lexeme, but if i use rules that accept a skipper with subrules that do *not* accept a skipper, the result should be the same, shouldn't it? At least my test cases work as expected. Are there any pitfalls to this approach? — pschulz, Nov 02 '18 at 07:48
Yes. Skippers are skippers, regardless of who specifies them. This **also** means that a rule is a lexeme if it is declared without a skipper (even if the surrounding grammar uses a skipper). — sehe, Nov 02 '18 at 09:43

Boost spirit skip parser with at least one whitespace

1 Answers1

BONUS Review notes