2

Assuming I have the following rule:

identifier %= 
        lexeme[
            char_("a-zA-Z")
            >> -(*char_("a-zA-Z_0-9")
            >> char_("a-zA-Z0-9"))
        ]
        ;

qi::rule<Iterator, std::string(), Skipper> identifier;

and the following input:

// identifier
This_is_a_valid123_Identifier

As the traces show the identifier is parsed properly and the attributes are set but the skipper starts one char after the first character of the string again:

<identifier>
  <try>This_is_a_valid123_I</try>
  <skip>
    <try>This_is_a_valid123_I</try>
    <emptylines>
      <try>This_is_a_valid123_I</try>
      <fail/>
    </emptylines>
    <comment>
      <try>This_is_a_valid123_I</try>
      <fail/>
    </comment>
    <fail/>
  </skip>
  <success>his_is_a_valid123_Id</success>
  <attributes>[[T, h, i, s, _, i, s, _, a, _, v, a, l, i, d, 1, 2, 3, _, I, d, e, n, t, i, f, i, e, r]]</attributes>
</identifier>
<skip>
  <try>his_is_a_valid123_Id</try>
  <emptylines>
    <try>his_is_a_valid123_Id</try>
    <fail/>
  </emptylines>
  <comment>
    <try>his_is_a_valid123_Id</try>
    <fail/>
  </comment>
  <fail/>
</skip>

I've already tried to use as_string in the lexeme expression which did not help.

Baradé
  • 1,290
  • 1
  • 15
  • 35

1 Answers1

4

I don't see why you complicate the expression. Can you try

identifier %= 
                char_("a-zA-Z")
            >> *char_("a-zA-Z_0-9")
        ;

qi::rule<Iterator, std::string()> identifier;

This is about the most standard expression you can get. Even if you don't want to allow identifiers ending in _ I'm quite sure you don't want such a trailing _ to be parsed as 'the next token'. In such a case, I'd just add validation after the parse.

Update To the comment:

Here is the analysis:

  • First off: -(*x) is a red flag. It is never a useful pattern as *x already matches an empty sequence, you can't make it "more optional"

    (in fact, if *x was made to allow partial backtracking as in regular expression, you'd likely have seen exponential performance or even infite runtime; "luckily", *x is always greedy in Spirit Qi).

This indeed facilitates your bug. Let's look at your parser expression in the OP as lines 1, 2, 3.

  • First, Line 1 matches T.
  • The second line initially greedily matches his_is_a_valid123_Identifier.
  • But that cannot satisfy the third line, so the -(...) kicks in and everything after line 1 is backtracked.
  • However: Qi

    • does backtrack the cursor (current input iterator) but
    • does not by default rollback changes to container attributes.

    Yes. You guessed it. std::string is a container attribute.

So what you end up is a succeeded match with length 1 and residu of a failed optional sequence in the attribute.

Some other backgrounders on how to resolve this kind of backtracking issue:

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • and why is it not possible to use the parser i posted? – Baradé Mar 26 '14 at 22:08
  • Honestly, I haven't looked. I'm in the habit of reasoning towards the goal. "Organically grown" parser expressions might not even complete, and reasoning about them is... tough. However, see the **updated** answer. Cheers. – sehe Mar 26 '14 at 22:41
  • This should be marked as the answer! It a very good explanation!! – Raydel Miranda Jul 03 '14 at 17:27