1

I have a file containing data on the form:

fractal mand1 {
    ;lkkj;kj;
}

fractal mand2 {
    if (...) {
        blablah;
    }
}

fractal julia1 {
    a = ss;
}

I want to extract the name of data containers, so I want to retrieve a vector containing in the specific case mand1, mand2, julia1.

I've read the sample about parsing a number list into a vector, but I want to maintain the grammar in a separate file.

I've create a struct representing the grammar, and then I use it in order to parse the string containing data. I would expect an output like

mand1
mand2
julia1

Instead I obtain

mand1 {
        ;lkkj;kj;
    }

    fractal mand2 {
        if (...) {
            blablah;
        }
    }

    fractal julia1 {
        a = ss;
    }

My parser recognizes the first fractal term but then it parses the rest of the file as single string item instead that parse it as I want.

What I'm doing wrong?

#include <boost/spirit/include/qi.hpp>
#include <string>
#include <vector>
#include <iostream>

using boost::spirit::ascii::space;
using boost::spirit::ascii::space_type;
using boost::spirit::qi::phrase_parse;
using boost::spirit::qi::lit;
using boost::spirit::qi::lexeme;
using boost::spirit::qi::skip;
using boost::spirit::ascii::char_;
using boost::spirit::ascii::no_case;
using boost::spirit::qi::rule;

typedef std::string::const_iterator sit;

template <typename Iterator>
struct FractalListParser : boost::spirit::qi::grammar<Iterator, std::vector<std::string>(), boost::spirit::ascii::space_type> {
    FractalListParser() : FractalListParser::base_type(start)   {

        no_quoted_string %= *(lexeme[+(char_ - '"')]);
        start %= *(no_case[lit("fractal")] >> no_quoted_string >> '{' >> *(skip[*(char_)]) >> '}');
    }

    rule<Iterator, std::string(), space_type> no_quoted_string;
    rule<Iterator, std::vector<std::string>(), space_type> start;
};

int main() {

    const std::string fractalListFile(R"(
    fractal mand1 {
        ;lkkj;kj;
    }

    fractal mand2 {
        if (...) {
            blablah;
        }
    }

    fractal julia1 {
        a = ss;
    }
    )");

    std::cout << "Read Test:" << std::endl;
    FractalListParser<sit> parser;
    std::vector<std::string> data;
    bool r = phrase_parse(fractalListFile.begin(), fractalListFile.end(), parser, space, data);
    for (auto& i : data) std::cout << i << std::endl;
    return 0;
}
Jepessen
  • 11,744
  • 14
  • 82
  • 149

1 Answers1

3

If you use error handling, you'll find that the parse failed, and nothing got effectively parsed:

Live On Coliru

Output:

Read Test:
Parse success:
----
mand1 {
    ;lkkj;kj;
}

fractal mand2 {
    if (...) {
        blablah;
    }
}

fractal julia1 {
    a = ss;
}

Remaining unparsed input: 'fractal mand1 {
    ;lkkj;kj;
}

fractal mand2 {
    if (...) {
        blablah;
    }
}

fractal julia1 {
    a = ss;
}
'

What was the problem?

  1. You probably want to ignore the "body" (between {}). Therefore I suppose you actually wanted to omit the attribute:

         >> '{' >> *(omit[*(char_)]) >> '}'
    

    rather than skip(*char_).

  2. The expression *char_ is greedy, and will always match to the end of input... You probably wanted to limit the charset:

    • in the "name" *~char_("\"{") to avoid "eating" all of the body as well. To avoid matching spaces use graph (e.g. +graph - '"'). In case you want to parse "identifiers" be explicit e.g.

      alpha > *(alnum | char_('_'))
      
    • in the body *~char_('}') or *(char_ - '}') (the latter being less efficient).

  3. The nesting of optional quantifiers is not productive:

    *(omit[*(char_)])
    

    Will just have very slow worst-case runtime (because *char_ could be empty, and *(omit[*(char_)]) could also be empty). Say what you mean instead:

    omit[*char_]
    
  4. The simplest way to have a lexeme is to drop the skipper from the rule declaration (see also Boost spirit skipper issues)

Program logic:

  1. Since your sample contains nested blocks (mand2 for example), you need to treat the blocks recursively in order to avoid calling the first } the end of the outer block:

    block = '{' >> -block % (+~char_("{}")) >> '}';
    

Loose hints:

  1. use BOOST_SPIRIT_DEBUG to find out where parsing is rejected/matched. E.g. after refactoring the rules a bit:

    we got the output (On Coliru):

    Read Test:
    <start>
    <try>fractal mand1 {\n    </try>
    <no_quoted_string>
        <try>mand1 {\n    ;lkkj;kj</try>
        <success> {\n    ;lkkj;kj;\n}\n\n</success>
        <attributes>[[m, a, n, d, 1]]</attributes>
    </no_quoted_string>
    <body>
        <try>{\n    ;lkkj;kj;\n}\n\nf</try>
        <fail/>
    </body>
    <success>fractal mand1 {\n    </success>
    <attributes>[[]]</attributes>
    </start>
    Parse success:
    Remaining unparsed input: 'fractal mand1 {
        ;lkkj;kj;
    }
    
    fractal mand2 {
        if (...) {
            blablah;
        }
    }
    
    fractal julia1 {
        a = ss;
    }
    '
    

    That output helped me spot that I actually forgot the - '}' part in the body rule... :)

  2. No need for %= when there are no semantic actions involved in that rule definition (docs)

  3. you probably want to make sure fractal is actually a separate word, so you don't match fractalset multi { .... }


Demo Program

With these in place we can have a working demo:

Live On Coliru

//#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>
#include <iostream>

namespace qi = boost::spirit::qi;

template <typename Iterator>
struct FractalListParser : qi::grammar<Iterator, std::vector<std::string>(), qi::space_type> {
    FractalListParser() : FractalListParser::base_type(start)   {
        using namespace qi;

        identifier = alpha > *(alnum | char_('_'));
        block      = '{' >> -block % +~char_("{}") >> '}';

        start      = *(
                        no_case["fractal"] >> identifier >> block
                   );

        BOOST_SPIRIT_DEBUG_NODES((start)(block)(identifier))
    }

    qi::rule<Iterator, std::vector<std::string>(), qi::space_type> start;
    // lexemes (just drop the skipper)
    qi::rule<Iterator, std::string()> identifier;
    qi::rule<Iterator> block; // leaving out the attribute means implicit `omit[]`
};

int main() {

    using It = boost::spirit::istream_iterator;
    It f(std::cin >> std::noskipws), l;

    std::cout << "Read Test:" << std::endl;

    FractalListParser<It> parser;

    std::vector<std::string> data;
    bool r = qi::phrase_parse(f, l, parser, qi::space, data);
    if (r) {
        std::cout << "Parse success:\n";
        for (auto& i : data)
            std::cout << "----\n" << i << "\n";
    } else {
        std::cout << "Parse failed\n";
    }

    if (f != l)
        std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}

Prints:

Read Test:
Parse success:
----
mand1
----
mand2
----
julia1
Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • For fun and glory, added [an X3 version](http://coliru.stacked-crooked.com/a/1c88f93c2a4010b7) and here's the recorded live stream: https://www.livecoding.tv/video/fixing-a-nested-blocks-grammar-in-qi-x3/ – sehe Nov 28 '15 at 22:09
  • `in the "name" *~char_("\{") to avoid` there something wrong – Tomilov Anatoliy Nov 29 '15 at 21:43
  • @Orient that's not required afaik, what's your source? – sehe Nov 29 '15 at 22:12
  • *~char_("\"{) try to search this string on the page. I don't know why cite is pasted wrong. – Tomilov Anatoliy Nov 30 '15 at 01:49
  • @Orient Ah. You meant to highlight a typo :) Well, it needs to be `*~char_("\"{")` which is why I didn't get it :) Thanks – sehe Nov 30 '15 at 03:14