2

I am writing a grammar that contains a rule for parsing email addresses. The rule is declared as:

qi::rule<Iterator, ascii::space_type, std::string()> email;

and its definition is:

email 
   =
      qi::lexeme[
          +ascii::alnum 
          >> *(qi::char_(".") >> +ascii::alnum) 
          >>  qi::char_("@") 
          >> +ascii::alnum 
          >> +(qi::char_(".") >> +ascii::alnum)
      ]

When I parse a text using this grammar, the parser correctly matches the email address, but the rule's synthesized attribute does not correspond to the correct address. For example, if the text contains the address info.it@example.com, the synthesized attribute is info.@example. I think this is due to the kleen and plus operators.

I am using boost 1.48 and I have tested the code with boost 1.54 and in that version it works properly, but unfortunately I cannot upgrade to it in my project.

I can I work around this problem, maybe using semantic actions?

giulatona
  • 137
  • 2
  • 9

1 Answers1

2

Interesting.

I suppose it has to do with a change in how container attributes get appended to by subsequent container-handling parser expressions.

I'm not going to install that library version, but here's a few things you can do:

NOTE

  • your pattern is not for general email addressing. This is much more complicated in reality. I'm assuming your rule is right for your internal requirements.

  • Your rule doesn't allow .. anywhere, right? Assuming this is on purpose too

  • Your rule doesn't start . at the start or end of a substring either. Assuming this is on purpose too

  1. Drop the skipper since the whole rule is a lexeme: (see Boost spirit skipper issues)

    qi::rule<Iterator, std::string()> email;
    
    email =
            +ascii::alnum
            >> *(qi::char_(".") >> +ascii::alnum)
            >>  qi::char_("@")
            >> +ascii::alnum
            >> +(qi::char_(".") >> +ascii::alnum)
            ;
    
  2. Now, use either raw[] or as_string[] to gather the whole input:

    qi::rule<Iterator, std::string()> email;
    
    email = qi::as_string [
        +ascii::alnum
        >> *(qi::char_(".") >> +ascii::alnum)
        >>  qi::char_("@")
        >> +ascii::alnum
        >> +(qi::char_(".") >> +ascii::alnum)
    ];
    
  3. Using raw[] you don't even need the attribute capturing making the rule both more efficient and simpler:

    qi::rule<Iterator, std::string()> email;
    
    email = qi::raw [
           +ascii::alnum >> *('.' >> +ascii::alnum)
        >> '@'
        >> +ascii::alnum >> +('.' >> +ascii::alnum)
    ];
    
Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thank you, you assumed correctly, the parser is suited for my application. I tested your solution and works perfectly when using raw[], however solution #2 does not work, it still produces an output like info.@example. – giulatona Mar 26 '15 at 13:34
  • @giulatona Yeah, that makes some sense. It was worth a try though. Like I said, I haven't installed the >3year old version of boost to test the approach :) With later boost versions, they all work – sehe Mar 26 '15 at 13:41