1

How can I use flex lexer in C++ and modify a token's yytext value? Lets say, I have a rule like this:

"/*"    {
        char c;
        while(true)
            {
            c = yyinput();
            if(c == '\n')
                ++mylineno;

            if (c==EOF){
                yyerror( "EOF occured while processing comment" );
                break;
            }
            else if(c == '*')
                {
                if((c = yyinput()) == '/'){
                    return(tokens::COMMENT);}
                else
                    unput(c);
                }
            }
        }

And I want to get token tokens::COMMENT with value of comment between /* and */. (The bove solution gives "/*" as the value.

Additional, very important is tracking the line number, so I'm looking for solution supporting it.

EDIT Of course I can modify the yytext and yyleng values (like yytext+=1; yyleng-=1, but still I cannot solve the above problem)

Lesmana
  • 25,663
  • 9
  • 82
  • 87
Wojciech Danilo
  • 11,573
  • 17
  • 66
  • 132
  • Do you have a parser taking tokens from here. or it just lexer only? You can solve this easily in the parser. –  Apr 05 '13 at 14:30
  • I really would like to solve it in the lexer - is it possible? – Wojciech Danilo Apr 05 '13 at 14:41
  • Check out this existing answer: http://stackoverflow.com/a/2130124/1003855 – Josh Apr 05 '13 at 14:45
  • Sorry, but there is no answer to my question :( – Wojciech Danilo Apr 05 '13 at 14:49
  • @danilo2 ok. Then how do you handle strings? do you have some pool to store them in? or show me how you recognize a string literal. –  Apr 05 '13 at 14:50
  • Strings have the same problem while using flex - I have to read the substring in "parser" not "tokenizer", because tokenizer cannot return string without front and back quotations - but if it could, this problem would be solved also (and the string handling would be nicer) – Wojciech Danilo Apr 05 '13 at 14:53
  • Yes but if you have some way of recognizing `"somethin"` as string I have suggested an answer for you. Take a look. –  Apr 05 '13 at 14:56
  • @stardust_ - Of course I have and of course it is easly solvable in parser, BUT *the lexer is responsible for outputing tokens with good values* - If I have to change the values *later* this is not pure design. – Wojciech Danilo Apr 06 '13 at 14:16

1 Answers1

1

I still think start conditions are the right answer.

%x C_COMMENT
char *str = NULL;
void addToString(char *data)
{
    if(!str)
    { 
        str = strdup(data);
    }
    else
    {
        /* handle string concatenation */
    }
}

"/*"                       { BEGIN(C_COMMENT); }
<C_COMMENT>([^*\n\r]|(\*+([^*/\n\r])))*    { addToString(yytext); }
<C_COMMENT>[\n\r]          { /* handle tracking, add to string if desired */ }
<C_COMMENT>"*/"            { BEGIN(INITIAL); }

I used the following as references:
http://ostermiller.org/findcomment.html
https://stackoverflow.com/a/2130124/1003855

You should be able to use a similar regular expression to handle strings.

Community
  • 1
  • 1
Josh
  • 1,574
  • 1
  • 15
  • 36