2

I am relatively new to Spirit Qi, and am trying to parse an assembler-like language.

For example, I'd like to parse:

Func Ident{
    Mov name, "hello"
    Push 5
    Exit
}

So far, so good. I can parse it properly. However, the error handler sometimes comes up with strange error locations. Take for example the following faulty code:

Func Ident{
    Mov name "hello" ; <-- comma is missing here
    Push 5
    Exit
}

Here are the rules involved in this parsing:

    gr_function = lexeme["Func" >> !(alnum | '_')] // Ensure whole words
                    > gr_identifier
                    > "{"
                    > *( gr_instruction
                            |gr_label
                        |gr_vardecl
                        |gr_paramdecl)
                    > "}";

    gr_instruction = gr_instruction_names
                     > gr_operands;

    gr_operands = -(gr_operand % ',');

The parse will notice the error, but complain about a missing "}" after the Mov. I have a feeling that the issue is in the definition for "Func", but cannot pinpoint it. I'd like the parser to complain about a missing "," It would be ok if it complained about consequential errors, but it should definitely pinpoint a missing comma as the culprit.

I have tried variations such as:

gr_operands = -(gr_operand 
                >> *(','
                     > gr_operand)
                );

And others, but with other strange errors.

Does anyone have an idea of how to say "Ok, you may have an instruction without operands, but if you find one, and there is no comma before the next, fail at the comma"?

UPDATE

Thank you for your answers so far. The gr_operand is defined as follows:

    gr_operand = ( gr_operand_intlit
                  |gr_operand_flplit
                  |gr_operand_strlit
                  |gr_operand_register
                  |gr_operand_identifier);

    gr_operand_intlit = int_;

    gr_operand_flplit = double_;

    gr_operand_strlit = '"'
                        > strlitcont
                        > '"'
                    ;

    gr_operand_register = gr_register_names;

    // TODO: Must also not accept the keywords from the statement grammar
    gr_operand_identifier = !(gr_instruction_names | gr_register_names)
                            >> raw[
                                    lexeme[(alpha | '_') >> *(alnum | '_')]
                                  ];

    escchar.name("\\\"");
    escchar     = '\\' >> char_("\"");

    strlitcont.name("String literal content");
    strlitcont  = *( escchar | ~char_('"') );
namezero
  • 2,203
  • 3
  • 24
  • 37
  • 1
    It cannot parse the "name" by any rule, so the `*(...)`fails after the "Mov" and a "}" is required. Could you give the full definitions for `gr_instruction` and all rules needed for that? – Mike M Aug 23 '13 at 12:09
  • Done. Here is everything that gr_instruction relies on. – namezero Aug 23 '13 at 14:17

1 Answers1

2

You'll want to make it explicit what can be an operand. I guessed this:

gr_operand    = gr_identifier | gr_string;
gr_string     = lexeme [ '"' >> *("\"\"" | ~char_("\"")) >> '"' ];

Unrelated, but you might want to make it clear that a newline starts a new statement (using blank_type as the skipper):

        >> "{"
        >> -(
                  gr_instruction
                | gr_label
                | gr_vardecl
                | gr_paramdecl
            ) % eol
        > "}";

Now, the parser will be able to complain that it expects a newline at the time of parse fail.

I made up a fully working sample using your sketches in the original post.

See it live on Coliru:

#define BOOST_SPIRIT_DEBUG
#include <boost/spirit/include/qi.hpp>

namespace qi    = boost::spirit::qi;

template <typename It, typename Skipper = qi::blank_type>
    struct parser : qi::grammar<It, Skipper>
{
    parser() : parser::base_type(start)
    {
        using namespace qi;

        start = lexeme["Func" >> !(alnum | '_')] > function;
        function = gr_identifier
                    >> "{"
                    >> -(
                              gr_instruction
                            //| gr_label
                            //| gr_vardecl
                            //| gr_paramdecl
                        ) % eol
                    > "}";

        gr_instruction_names.add("Mov", unused);
        gr_instruction_names.add("Push", unused);
        gr_instruction_names.add("Exit", unused);

        gr_instruction = lexeme [ gr_instruction_names >> !(alnum|"_") ] > gr_operands;
        gr_operands = -(gr_operand % ',');

        gr_identifier = lexeme [ alpha >> *(alnum | '_') ];
        gr_operand    = gr_identifier | gr_string;
        gr_string     = lexeme [ '"' >> *("\"\"" | ~char_("\"")) >> '"' ];

        BOOST_SPIRIT_DEBUG_NODES((start)(function)(gr_instruction)(gr_operands)(gr_identifier)(gr_operand)(gr_string));
    }

  private:
    qi::symbols<char, qi::unused_type> gr_instruction_names;
    qi::rule<It, Skipper> start, function, gr_instruction, gr_operands, gr_identifier, gr_operand, gr_string;
};

int main()
{
    typedef boost::spirit::istream_iterator It;
    std::cin.unsetf(std::ios::skipws);
    It f(std::cin), l;

    parser<It, qi::blank_type> p;

    try
    {
        bool ok = qi::phrase_parse(f,l,p,qi::blank);
        if (ok)   std::cout << "parse success\n";
        else      std::cerr << "parse failed: '" << std::string(f,l) << "'\n";

        if (f!=l) std::cerr << "trailing unparsed: '" << std::string(f,l) << "'\n";
        return ok;
    } catch(const qi::expectation_failure<It>& e)
    {
        std::string frag(e.first, e.last);
        std::cerr << e.what() << "'" << frag << "'\n";
    }

    return false;
}
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks for your answer. I have updated the original question with everything gr_instruction relies on. I have already specified what may be an operand. I'm also working on the newline. Currently though, my skipper swallows newlines (I'll look at that after I finished this issue). – namezero Aug 23 '13 at 14:18
  • "already"? I believe that was three minutes ago. Also, I don't think it will change the answer, as the question didn't change. – sehe Aug 23 '13 at 14:21
  • 1
    About newline handling, you don't really need to change it, unless you anticipate `gr_operands` that may include newlines. I just prefer to keep grammars as strict as possible. In this case, you won't get a better diagnostic unless you are able to detect that the gr_instruction hasn't ended (due to a lack of newline). Because, if newlines are skippable, "hello" might be the start of the next instruction, and the parser should flag that as the failure. – sehe Aug 23 '13 at 14:22
  • Apologies for "already". I didn't mean "Hey look you missed it" rather "I've done it but didn't think it was pertinent enough to post the first time around" – namezero Aug 23 '13 at 14:33
  • What I meant was that I have the same thing as you, where operand can only be one of N types. But the parse still doesn't flag the missing comma between operands as a problem :[ (i.e. on_error() is not invoked for gr_operands, only for gr_function. – namezero Aug 23 '13 at 14:35
  • 1
    @namezero Okay. Clear. Let me paraphrase my last comment: The parser won't be able to flag a missing ',' unless it can detect the end/begin of a statement. If the ',' is missing, it will (rightly) assume that the next statement begins. So that's what will be flagged. – sehe Aug 23 '13 at 14:39
  • Ahhhh I think it's dawning on me now. Since it doesn't see the comma, it sees "hello" as the next instruction, then decides it matches nothing it knows, and complains about a premature end of the function (hence the missing '}'). Ok, so I think I'll have to figure out the newline issue. My skipper skips these now, which is good for everything but this (I.e. the '{' should be able to be on a new line. Is there a way to disable newline skipping on a per-rule basis? – namezero Aug 23 '13 at 14:44
  • 1
    @namezero Yes. Disable skipping altogether (by using no skipper in the rule declaration), or use `skip(s)[]` or `no_skip[]` directives. See **[this general answer](http://stackoverflow.com/questions/10465805/how-can-i-use-the-skipper-asciispace-without-skipping-eol/10469726#10469726) for backgrounds** – sehe Aug 23 '13 at 14:46
  • Excellent! I tried the skip(space)[] for the gr_instruction. However, this results in a compile time error. I suspect because the rule has a skipper defined. However, since it's the start rule, it will fail if I take the skipper out... – namezero Aug 23 '13 at 15:06
  • 2
    @namezero I fix that kind of 'issue' by having an 'intermediate' rule as the start rule. Or, you know, just change the skipper on the relevant rules/grammar. – sehe Aug 23 '13 at 15:09