2

How does one use boost::spirit with an input that consists of something other than characters?

In my case, I have a std::vector< AbstractBaseClass> that I would like to treat as a token stream into my grammar, where each AbstractBaseClass is a token. Something like:

struct AbstractBaseClass
{
};

struct ConcreteClassA : public AbstractBaseClass
{
};

struct ConcreteClassB : public AbstractBaseClass
{
};


std::vector<AbstractBaseClass> stream;
std::vector<AbstractBaseClass>::iterator iter = stream.begin();
std::vector<AbstractBaseClass>::iterator end = stream.end();
bool r = boost::spirit::qi::parse( iter, end, TOKEN_ID_FOR_CONCRETE_CLASS_A >> TOKEN_ID_FOR_CONCRETE_CLASS_B >> TOKEN_ID_FOR_CONCRETE_CLASS_A );

What methods do I need to add to my classes / what would the token ID's look like to support this?

Presumably I need to provide something analagous to boost::spirit::lex::token_def<> and boost::spirit::lex::token<>.

I have looked into using these directly, but these two classes seem to assume that there is a raw character stream under the lexer token, which is not true in my case; I get the tokens directly.

Edit:

Well, I answered my own question. I'll leave this up in-case anybody else might find it useful. The basics are explained here. There are a handful of caveats.

  • My first attempt was to use boost::variant to describe my tokens. The parser requires that the tokens be convertable to bool. To solve this, I wrapped my boost::variant in boost::optional. Edit: Actually, it seems it's the debugging capability that imposes this requirement. My current solution adds a custom debug handler instead of the stock one that no longer checks if the value of the iterator is "true".
  • Similiarly, the operator<< must be defined, at least if you want debug output.
  • In the parse() method, you need to check if your iterator is not at the end before you dereference it.
  • If you have lots of token types you may need to increase the size of MPL vector and list as described here.
tgoodhart
  • 3,111
  • 26
  • 37
  • If you don't get an answer here try asking in the boost spirit mailing list. It's very active. https://lists.sourceforge.net/lists/listinfo/spirit-general – Smittii Mar 28 '12 at 21:27
  • Or may be if no one is giving you an answer here you should start questioning if using `boost::spirit` is really a good idea. – 6502 Mar 29 '12 at 21:48

1 Answers1

1

Your self-answer seems to address a similar, but different question:

  • How can I create a parser class that consumes non-char elements

However, your original question was more along the lines of 'How can I use spirit parsers with a non-char tokenstream'?

In that case, the most helpful link would be to Spirit Lex which is LexerTL integrated into the Boost Spirit framework.

You can easily make Spirit Lex expose token intormation (beyond the token Id) if necessary, though by default the source iterator range is always available. That way you can mix and match Spirit Lex and Spirit Qi in quite flexible ways.

I don't have time to work out a simple example but,

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • As for your first point, I'd agree to an extent. I took it for granted that if somebody could figure out how to write a non-character parser then the usage was obvious. The last 3 lines of my example show a parser consuming abitrary tokens. As for Lex, while it is true that using Lex with a Spirit parser results in the parser consuming a non-character token stream (specifically, it consumes a boost::spirit::lex::token<> stream), as far as I can tell it is not possible to create Lex tokens that don't refer to a base character stream. – tgoodhart Apr 02 '12 at 15:52