5

I´m working with Spirit 2.4 and I'd want to parse a structure like this:

Text{text_field};

The point is that in text_field is a escaped string with the symbols '{', '}' and '\'. I would like to create a parser for this using qi. I've been trying this:

using boost::spirit::standard::char_;
using boost::spirit::standard::string;
using qi::lexeme;
using qi::lit;

qi::rule< IteratorT, std::string(), ascii::space_type > text;
qi::rule< IteratorT, std::string(), ascii::space_type > content;
qi::rule< IteratorT, std::string(), ascii::space_type > escChar;


text %= 
  lit( "Text" ) >> '{' >>
    content >>
  "};"
  ;

content %= lexeme[ +( +(char_ - ( lit( '\\' ) | '}' ) )  >> escChar ) ];

escChar %= string( "\\\\" ) 
  | string( "\\{" ) 
  | string( "\\}" );

But doesn't even compile. Any idea?

Bruno
  • 53
  • 1
  • 4

1 Answers1

8

Your grammar could be written as:

qi::rule< IteratorT, std::string(), ascii::space_type > text; 
qi::rule< IteratorT, std::string() > content;   
qi::rule< IteratorT, char() > escChar;   

text = "Text{" >> content >> "};";  
content = +(~char_('}') | escChar); 
escChar = '\\' >> char_("\\{}");

i.e.

  • text is Text{ followed by content followed by }

  • content is at least one instance of either a character (but no }) or an escChar

  • escChar is a single escaped \\, {, or }

Note, the escChar rule now returns a single character and discards the escaping \\. I'm not sure if that's what you need. Additionally, I removed the skipper for the content and escChar rules, which allows to leave off the lexeme[] (a rule without skipper acts like an implicit lexeme).

Felix Dombek
  • 13,664
  • 17
  • 79
  • 131
hkaiser
  • 11,403
  • 1
  • 30
  • 35
  • 2
    Hi, hkaiser and thanks for helping. I've tried your solution but it fails to parse this: Text{ \} }; I thought that it was because the parser ~char_('}') matches the backslash, but I tried the following with no succes: content = +( ~char_( "\\\\}" ) | escChar );. Any idea? – Bruno Oct 27 '10 at 17:19
  • 2
    Yeah, right. ~char_('}') does indeed match the backslash. I'm sorry for this oversight. If you change that to ~char_("\\}") it should not do that anymore. – hkaiser Oct 28 '10 at 01:37