6

I'm looking at writing a lexer using boost::spirit::lex, but all the examples I can find seem to assume that you've read the entire file into RAM first. I'd like to write a lexer that doesn't require the whole string to be in RAM, is that possible? Or do I need to use something else?

I tried using istream_iterator, but boost gives me a compile error unless I use const char* as the iterator types.

e.g. All the examples I can find basically do this:

lex_functor_type< lex::lexertl::lexer<> > lex_functor;

// assumes entire file is in memory
char const* first = str.c_str();
char const* last = &first[str.size()];

bool r = lex::tokenize(first, last, lex_functor, 
    boost::bind(lex_callback_functor(), _1, ... ));

Also, is it possible to determine line/column numbers from lex tokens somehow?

Thanks!

GManNickG
  • 494,350
  • 52
  • 494
  • 543
Brian
  • 61
  • 2

1 Answers1

7

Spirit Lex works with any iterator as long as it conforms to the requirements of standard forward iterators. That means you can feed the lexer (invoke lex::tokenize()) with any conforming iterator. For instance, if you want to use a std::istream, you could wrap it into a boost::spirit::istream_iterator:

bool tokenize(std::istream& is, ...)
{
    lex_functor_type< lex::lexertl::lexer<> > lex_functor;

    boost::spirit::istream_iterator first(is);
    boost::spirit::istream_iterator last;

    return lex::tokenize(first, last, lex_functor,
        boost::bind (lex_callback_functor(), _1, ... ));   
}

and it would work.

For the second part of your question (related to the line/column number of the input): yes it is possible to track the input position using the lexer. It's not trivial, though. You need to create your own token type which stores the line/column information and use this instead of the predefined token type. Many people have been asking for this, so I might just go ahead and create an example.

hkaiser
  • 11,403
  • 1
  • 30
  • 35
  • +1, yes, new examples in Spirit documentation would be great :) – Viet Jul 04 '11 at 05:46
  • 2
    I did that, actually. Boost V1.47 will have such a token type and a new example demonstrating how to use it. – hkaiser Jul 04 '11 at 11:22
  • Thank you Harmut! Very looking forward to Boost 1.47 release with new Spirit! – Viet Jul 09 '11 at 03:44
  • Has anybody got it run without reading whole file into memory? – user1587451 Aug 11 '14 at 16:09
  • @hkaiser I could not find the example you are reffering to. Can you please see [this question](http://stackoverflow.com/questions/32073454/how-to-determine-line-column-numbers-from-boostspiritlex-tokens)? Thank you. – ZeeByeZon Oct 04 '15 at 13:17