3

I am using BOOST Tokenizer to break a string into toekn. Basically the tokens will be used to create a compiler for VSL based on c/c++. What i wanted to ask that is it possible that the delimiter defined created using

char_separator<char> sep("; << "); 

be also displayed for example if i use Boost tokenizer on string

string s= "cout<<hello;"

it should make the following tokens

cout
<<
hello
;

Also how can i ensure that it does not convert tokenize something in quotes like

string s= "hello my \"name is\" Hassan"

should be converted to following tokens

hello
my
name is
Hassan
sehe
  • 374,641
  • 47
  • 450
  • 633
Hassan Jalil
  • 1,114
  • 4
  • 14
  • 34
  • I doubt that Boost tokenizer is really up to this task ("doubt" as in, "I would be flabbergasted if it could even come close"). Tokenizing C++ source code is a fairly non-trivial task. [Here](http://stackoverflow.com/a/7051822/179910) is a possibility that might get you started. – Jerry Coffin Mar 01 '14 at 22:22
  • Oh wait. Just noticed that you actually want to parse a programming language. Adding links to samples in my answer – sehe Mar 01 '14 at 23:01

1 Answers1

3

I suggest boost spirit: Live On Coliru

Edit See also http://www.boost.org/doc/libs/1_55_0/libs/spirit/example/qi/compiler_tutorial

#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

int main(int argc, char** argv)
{
    typedef std::string::const_iterator It;
    std::string const input = "cout<<hello;my \"name is\" Hassan";

    qi::rule<It, std::string()> delimiter = qi::char_("; ") | qi::string("<<");
    qi::rule<It, std::string()> quoted    = '"' >> *~qi::char_('"') > '"';
    qi::rule<It, std::string()> word      = +((quoted | qi::char_) - delimiter);

    std::vector<std::string> tokens;
    if (qi::parse(input.begin(), input.end(), *(word >> delimiter), tokens))
    {
        for(auto& token : tokens)
            std::cout << "'" << token <<  "'\n";
    }
}

Output:

'cout'
'<<'
'hello'
';'
'my'
' '
'name is'
' '
'Hassan'
sehe
  • 374,641
  • 47
  • 450
  • 633
  • The problem i am facing is when i add more words to the delimiter i end up not completely parsing the code. If i try adding qi::string("<") to the delimiter it does not parse the whole code , it gets stuck on the line include < iostream.h > If instead i use qi::string("< ") it works fine but then does not create tokens for x – Hassan Jalil Mar 03 '14 at 19:41
  • Congratulations. You've found out that tokenizing is more interesting than simple string comparison. I suggest you look into Boost Spirit Lex. And Boost Wave in particular (which implements a full c++preprocessor) – sehe Mar 03 '14 at 22:12