0

In XML an empty element can be represented in either of these ways:

<foo></foo>
<foo/>

If the input contains the latter, then I want to tokenize it like the former.

That is, if the input is <foo/> then I want the lexer to generate this sequence of (token kind, token value) pairs:

('<', '<')
("foo", STAG)
('>', '>')
("</foo>", ETAG)

I tried this (where <START_TAG> is an exclusive state and st is a global variable holding the element name, which is "foo" in this example):

<START_TAG>{
   "/>"    { yytext = ">";
             return(">");
             yytext = strcat(strcat("<", st), ">");
             yyval.strval = strdup(yytext);
             yy_pop_state();
             return(ETAG); 
           }
}

but it doesn't work.

Essentially I want the lexer to replace this token "/>" with these two tokens: ">" and "</foo>". How do I do that?

Roger Costello
  • 3,007
  • 1
  • 22
  • 43
  • 1
    Either you need to implement a token queue, or you need to use bison's push-parser interface. I prefer the push-parser but both are possible. – rici Feb 23 '22 at 00:45
  • 1
    Why? The parser should be able to cope with both forms. – user207421 Feb 23 '22 at 00:46
  • 1
    Longer answer here: https://stackoverflow.com/questions/42434603/how-can-flex-return-multiple-terminals-at-one-time/42444111 – rici Feb 23 '22 at 00:48
  • 1
    Also, I basically agree with @user207421. But it's good to know how to fabricate multiple tokens from a scanner action. – rici Feb 23 '22 at 00:49
  • That transformation requires that the algorithm understands the syntax or at least some subset of it. Syntax is not lexer's job; it doesn't have the right tools to interpret it. Parser is designed for syntax and it should be used to handle it. (Of course, you could try some ugly hacks but I doubt it will end well in the long run.) – Piotr Siupa Feb 23 '22 at 10:04

1 Answers1

0

After the returning statement return(">"); you don't execute anything else.

Also, changing the internal variable yytext with yytext = ">"; is never a good idea. First strdup(yytext) and then change anything you need in the new pointer.

Pax
  • 134
  • 5