extracting string from bracket using boost spirit

Question

I have the following string:

%%DocumentNeededResources: CMap (90pv-RKSJ-UCS2C)

I want to parse it and store/extract the 90pv-RKSJ-UCS2C string which is in bracket.

My rule is as follows:

std::string strLinesRecur = "%%DocumentNeededResources: CMap (90pv-RKSJ-UCS2C)";
std::string strStartTokenRecur;
std::string token_intRecur;
bool bParsedLine1 = qi::phrase_parse(strLinesRecur.begin(), strLinesRecur.end(), +char_>>+char_,':', token_intRecur, strStartTokenRecur);

score 1 · Accepted Answer · edited May 23 '17 at 12:06

It looks like you thought a skipper is a split delimiter. It's quite the opposite (Boost spirit skipper issues).

In this rare circumstance I think I'd prefer a regular expression. But, since you asked here's the spirit take:

Live On Coliru

#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

int main() {
    std::string const line = "%%DocumentNeededResources: CMap (90pv-RKSJ-UCS2C)";

    auto first = line.begin(), last = line.end();

    std::string label, token;
    bool ok = qi::phrase_parse(
            first, last, 
            qi::lexeme [ "%%" >> +~qi::char_(":") ] >> ':' >> qi::lexeme["CMap"] >> '(' >> qi::lexeme[+~qi::char_(')')] >> ')',
            qi::space,
            label, token);

    if (ok)
        std::cout << "Parse success: label='" << label << "', token='" << token << "'\n";
    else
        std::cout << "Parse failed\n";

    if (first!=last)
        std::cout << "Remaining unparsed input: '" << std::string(first, last) << "'\n";
}

Prints

Parse success: label='DocumentNeededResources', token='90pv-RKSJ-UCS2C'

score 1 · Answer 2 · answered Jul 06 '15 at 11:04

Alright, so assuming we are given the following using and alias namespace directives:

using namespace boost::spirit::qi;
namespace phx = boost::phoenix;

And given the string:

std::string strLinesRecur = "%%DocumentNeededResources: CMap (90pv-RKSJ-UCS2C)";

We would like to extract the "code" inside the parenthesis into res:

std::string res;

One way to do this is to use boost::phoenix::ref as semantic action. So given a code grammar as:

using boost::spirit::ascii::alnum;
auto code = copy(+(alnum | char_('-')));

(Which is along the lines of what in a regex would be [a-zA-Z\-])

We can create our own grammar for the whole string:

using boost::spirit::ascii::alpha;
auto grammar = copy(
    (char_('%') >> char_('%') >> +alpha >> char_(':')) 
        >> +alpha >> char_('(') >> as_string[lexeme[code]][phx::ref(res) = _1] >> char_(')'));

Which parses anything that begins with two %, follows with some alphabetic characters and a :, then follows with some "code" within parenthesis.

The whole point to this is as_string[lexeme[code]][phx::ref(res) = _1]. If we break it down: lexeme[code] just says to treat the parsed code as an atomic unit, as_string "returns" the result as std::string (as opposed to std::vector<char>) and [phx::ref(res) = _1] uses semantic actions to store the parsed string into res (_1 is a placeholder for the first match within that grammar).

In this case spaces are skipped by the following call:

using boost::spirit::ascii::blank;
phrase_parse(begin(strLinesRecur), end(strLinesRecur), grammar, blank);

Live demo

This is of course just an example of a grammar that would fit the string.

_{Note: copy refers to qi::copy and it's one way to be able to store pieces of grammars like in the objects code and grammar. Without that the use of auto will fail (probably with a segmentation fault).}

Hmm I wasn't aware of `qi::copy` o.O I always use `boost::proto::deep_copy` or `BOOST_SPIRIT_AUTO` (which has the benefit of working in c++03 as well). — sehe, Jul 06 '15 at 23:29
Here I offer you my (vastly :)) simplified version based on your take: **[Live On Coliru](http://coliru.stacked-crooked.com/a/baba2953f7735d29)** — sehe, Jul 06 '15 at 23:47

extracting string from bracket using boost spirit

2 Answers2