10

I try to implement a Lexer for a little programming language with Boost Spirit.

I have to get the value of a token and I get a bad_get exception :

terminate called after throwing an instance of 'boost::bad_get'
what(): boost::bad_get: failed value get using boost::get Aborted

I obtain this exception when doing :

std::string contents = "void";

base_iterator_type first = contents.begin();
base_iterator_type last = contents.end();

SimpleLexer<lexer_type> lexer;

iter = lexer.begin(first, last);
end = lexer.end();

std::cout << "Value = " << boost::get<std::string>(iter->value()) << std::endl;

My lexer is defined like that :

typedef std::string::iterator base_iterator_type;
typedef boost::spirit::lex::lexertl::token<base_iterator_type, boost::mpl::vector<unsigned int, std::string>> Tok;
typedef lex::lexertl::actor_lexer<Tok> lexer_type;

template<typename L>
class SimpleLexer : public lex::lexer<L> {
    private:

    public:
        SimpleLexer() {
            keyword_for = "for";
            keyword_while = "while";
            keyword_if = "if";
            keyword_else = "else";
            keyword_false = "false";
            keyword_true = "true";
            keyword_from = "from";
            keyword_to = "to";
            keyword_foreach = "foreach";

            word = "[a-zA-Z]+";
            integer = "[0-9]+";
            litteral = "...";

            left_parenth = '('; 
            right_parenth = ')'; 
            left_brace = '{'; 
            right_brace = '}'; 

            stop = ';';
            comma = ',';

            swap = "<>";
            assign = '=';
            addition = '+';
            subtraction = '-';
            multiplication = '*';
            division = '/';
            modulo = '%';

            equals = "==";
            not_equals = "!=";
            greater = '>';
            less = '<';
            greater_equals = ">=";
            less_equals = "<=";

            whitespaces = "[ \\t\\n]+";
            comments = "\\/\\*[^*]*\\*+([^/*][^*]*\\*+)*\\/";

            //Add keywords
            this->self += keyword_for | keyword_while | keyword_true | keyword_false | keyword_if | keyword_else | keyword_from | keyword_to | keyword_foreach;
            this->self += integer | litteral | word;

            this->self += equals | not_equals | greater_equals | less_equals | greater | less ;
            this->self += left_parenth | right_parenth | left_brace | right_brace;
            this->self += comma | stop;
            this->self += assign | swap | addition | subtraction | multiplication | division | modulo;

            //Ignore whitespaces and comments
            this->self += whitespaces [lex::_pass = lex::pass_flags::pass_ignore];
            this->self += comments [lex::_pass = lex::pass_flags::pass_ignore]; 
        }

        lex::token_def<std::string> word, litteral, integer;

        lex::token_def<lex::omit> left_parenth, right_parenth, left_brace, right_brace;

        lex::token_def<lex::omit> stop, comma;

        lex::token_def<lex::omit> assign, swap, addition, subtraction, multiplication, division, modulo;
        lex::token_def<lex::omit> equals, not_equals, greater, less, greater_equals, less_equals;

        //Keywords
        lex::token_def<lex::omit> keyword_if, keyword_else, keyword_for, keyword_while, keyword_from, keyword_to, keyword_foreach;
        lex::token_def<lex::omit> keyword_true, keyword_false;

        //Ignored tokens
        lex::token_def<lex::omit> whitespaces;
        lex::token_def<lex::omit> comments;
};

Is there an other way to get the value of a Token ?

Baptiste Wicht
  • 7,472
  • 7
  • 45
  • 110
  • 3
    on reading again, I notice that you specify `lex::omit` as the token attribute type. These tokens won't expose _any_ value data (not even iterator pairs). This might be your problem. Otherwise, I heartily recommend parsing using Qi on top of token iterators: get the best of both worlds. – sehe Oct 14 '11 at 09:26
  • I verified and sadly this is not the problem. I only use boost::get on a token that of the good type and that should have the value. – Baptiste Wicht Oct 14 '11 at 10:52

1 Answers1

9

You can always use the 'default' token data (which is iterator_range of the source iterator type).

std::string tokenvalue(iter->value().begin(), iter->value().end());

After studying the test cases in the boost repository, I found out a number of things:

  • this is by design
  • there is an easier way
  • the easier way comes automated in Lex semantic actions (e.g. using _1) and when using the lexer token in Qi; the assignment will automatically convert to the Qi attribute type
  • this has (indeed) got the 'lazy, one-time, evaluation' semantics mentioned in the docs

The cinch is that the token data is variant, which starts out as the raw input iterator range. Only after 'a' forced assignment, the converted attribute is cached in the variant. You can witness the transition:

lexer_type::iterator_type iter = lexer.begin(first, last);
lexer_type::iterator_type end = lexer.end();

assert(0 == iter->value().which());
std::cout << "Value = " << boost::get<boost::iterator_range<base_iterator_type> >(iter->value()) << std::endl;

std::string s;
boost::spirit::traits::assign_to(*iter, s);
assert(1 == iter->value().which());
std::cout << "Value = " << s << std::endl;

As you can see, the attribute assignment is forced here, directly using the assign_to trait implementation.

Full working demonstration:

#include <boost/spirit/include/lex_lexertl.hpp>

#include <iostream>
#include <string>

namespace lex = boost::spirit::lex;

typedef std::string::iterator base_iterator_type;
typedef boost::spirit::lex::lexertl::token<base_iterator_type, boost::mpl::vector<int, std::string>> Tok;
typedef lex::lexertl::actor_lexer<Tok> lexer_type;

template<typename L>
class SimpleLexer : public lex::lexer<L> {
    private:

    public:
        SimpleLexer() {
            word = "[a-zA-Z]+";
            integer = "[0-9]+";
            literal = "...";

            this->self += integer | literal | word;
        }

        lex::token_def<std::string> word, literal;
        lex::token_def<int> integer;
};

int main(int argc, const char* argv[]) {
    SimpleLexer<lexer_type> lexer;

    std::string contents = "void";

    base_iterator_type first = contents.begin();
    base_iterator_type last = contents.end();

    lexer_type::iterator_type iter = lexer.begin(first, last);
    lexer_type::iterator_type end = lexer.end();

    assert(0 == iter->value().which());
    std::cout << "Value = " << boost::get<boost::iterator_range<base_iterator_type> >(iter->value()) << std::endl;

    std::string s;
    boost::spirit::traits::assign_to(*iter, s);
    assert(2 == iter->value().which());
    std::cout << "Value = " << s << std::endl;

    return 0;
}
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Looks a bit overcomplicated for something that should be done by Spirit. In my case the token are typed to get their value so I get a variant from value() not directly an iterator. I also have a int token. With your technique, you do not take advantage of the variant provided by value(), no ? – Baptiste Wicht Oct 14 '11 at 08:54
  • 1
    What part is overcomplicated? The part where it said `std::string(iter->value().begin(), iter->value().end())`? I didn't spell it out (you expect us to read into your Looooong sample to '_get_' what you mean, and you don't want to read 7 lines of `showtoken` to see how it's done? Mmmm.) My sample probably seems overcomplicated because it is a full example of how to use this in a real life parser to achieve, e.g. error reporting. Sorry for showing things that didn't interest you :) – sehe Oct 14 '11 at 08:59
  • The thing I find overcomplicated is that we parse by hand something that Boost Spirit gives to us. If I have float, int, string and bool tokens and I want to get their primitive values, I have to create 4 functions of parsing, no ? And normally these values are stored in the boost::variant. Or perhaps I do not understand the return value of value() function. – Baptiste Wicht Oct 14 '11 at 09:12
  • 2
    I suspect you might misunderstand the use of Lexer. The Lexer parses into tokens (which are by definition just source iterator ranges). If you want simple and automatic value extraction, use Spirit Qi (although, granted, Lexer tokens _can_ give you [_attribute values_ directly](http://www.boost.org/doc/libs/1_47_0/libs/spirit/doc/html/spirit/lex/tutorials/lexer_quickstart3.html). Will update answer in a bit. – sehe Oct 14 '11 at 09:24
  • I've worked out the conditions in which the evaluation happens. I hope that information helps. – sehe Oct 14 '11 at 21:46