2

Using boost spirit, I'd like to extract a string that is followed by some data in parentheses. The relevant string is separated by a space from the opening parenthesis. Unfortunately, the string itself may contain spaces. I'm looking for a concise solution that returns the string without a trailing space.

The following code illustrates the problem:

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <string>
#include <iostream>

namespace qi = boost::spirit::qi;
using std::string;
using std::cout;
using std::endl;

void
test_input(const string &input)
{
    string::const_iterator b = input.begin();
    string::const_iterator e = input.end();
    string parsed;
    bool const r = qi::parse(b, e,
        *(qi::char_ - qi::char_("(")) >> qi::lit("(Spirit)"),
            parsed
    );
    if(r) {
        cout << "PASSED:" << endl;
    } else {
        cout << "FAILED:" << endl;
    }
    cout << "  Parsed: \"" << parsed << "\"" << endl;
    cout << "  Rest: \"" << string(b, e) << "\"" << endl;
}

int main()
{
    test_input("Fine (Spirit)");
    test_input("Hello, World (Spirit)");

    return 0;
}

Its output is:

PASSED:
  Parsed: "Fine "
  Rest: ""
PASSED:
  Parsed: "Hello, World "
  Rest: ""

With this simple grammar, the extracted string is always followed by a space (that I 'd like to eliminate).

The solution should work within Spirit since this is only part of a larger grammar. (Thus, it would probably be clumsy to trim the extracted strings after parsing.)

Thank you in advance.

  • 1
    Always a space and only a space? If that is the case I think `*(qi::char_ - qi::lit(" ("))` should work, althought there is probably a better answer. – llonesmiz Oct 25 '13 at 14:31
  • Thank you very much! With my test case, this appears to work. (Although I don't claim to understand why (yet): Matching single characters that don't contain a literal!?) – Carsten Scholtes Oct 25 '13 at 14:48
  • 1
    Unlike `~`, the difference parser is not something specific to qi::char_ (although it is frequently used with it). The binary operator `-` succeeds if its second argument fails and its first succeeds. In your example while `qi::lit(" (")` fails to match, your expression keeps adding chars to its synthesized attribute. – llonesmiz Oct 26 '13 at 05:32
  • +1 Thank you for explaining the code you proposed. I believe this highlights an important aspect of the inner workings of Spirit that is not so obvious: The second part of the difference operator is not matched against the match of the first part but rather against the same rest of the input that the first part was matched against. Furthermore, in my view, your solution describes an essential element for building concise and readable expressions. – Carsten Scholtes Nov 10 '13 at 08:58
  • For completeness, it should be mentioned, that the avoided space must be added in front of the opening parenthesis, yielding: `*(qi::char_ - qi::lit(" (")) >> qi::lit(" (Spirit)")`. If you'd put this together in an answer, I'd gladly mark it as the most suitable solution. – Carsten Scholtes Nov 10 '13 at 08:59

1 Answers1

3

Like the comment said, in the case of a single space, you can just hard code it. If you need to be more flexible or tolerant:

I'd use a skipper with raw to "cheat" the skipper for your purposes:

bool const r = qi::phrase_parse(b, e,
    qi::raw [ *(qi::char_ - qi::char_("(")) ] >> qi::lit("(Spirit)"),
    qi::space,
    parsed
);

This works, and prints

PASSED:
  Parsed: "Fine"
  Rest: ""
PASSED:
  Parsed: "Hello, World"
  Rest: ""

See it Live on Coliru

Full program for reference:

#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix_operator.hpp>
#include <string>
#include <iostream>

namespace qi = boost::spirit::qi;
using std::string;
using std::cout;
using std::endl;

void
test_input(const string &input)
{
    string::const_iterator b = input.begin();
    string::const_iterator e = input.end();
    string parsed;
    bool const r = qi::phrase_parse(b, e,
        qi::raw [ *(qi::char_ - qi::char_("(")) ] >> qi::lit("(Spirit)"),
        qi::space,
        parsed
    );
    if(r) {
        cout << "PASSED:" << endl;
    } else {
        cout << "FAILED:" << endl;
    }
    cout << "  Parsed: \"" << parsed << "\"" << endl;
    cout << "  Rest: \"" << string(b, e) << "\"" << endl;
}

int main()
{
    test_input("Fine (Spirit)");
    test_input("Hello, World (Spirit)");

    return 0;
}
sehe
  • 374,641
  • 47
  • 450
  • 633
  • @cv_and_he Thanks. There's a time and place for everything, `raw` assumes the exact matched input sequence is your attribute data. If that's not the case, you'd need to add postprocessing (_semantic actions_?) or be better off writing a more elaborate grammar. – sehe Oct 25 '13 at 18:26
  • Thank you for proposing this interesting option. If I understand correctly, `raw` provides a flat string instead of an attribute that mirrors the hierarchy of the expression inside `raw`. I'm still hesitant to mark this as a solution since I had to introduce skip parsers in the existing code. The comment of cv_and_he appears to be more to the point. – Carsten Scholtes Nov 10 '13 at 09:14
  • @CarstenScholtes `qi::raw` exposes an iterator range: ["The raw[\] disregards the attribute of its subject parser, instead exposing the half-open range [first, last) pointing to the matched characters from the input stream"](http://www.boost.org/doc/libs/1_54_0/libs/spirit/doc/html/spirit/qi/reference/directive/raw.html) – sehe Nov 10 '13 at 11:26
  • @CarstenScholtes On your choice, see the first line(s) of my answer. It's all about what input you must support. – sehe Nov 10 '13 at 11:28