In XML an empty element can be represented in either of these ways:
<foo></foo>
<foo/>
If the input contains the latter, then I want to tokenize it like the former.
That is, if the input is <foo/>
then I want the lexer to generate this sequence of (token kind, token value)
pairs:
('<', '<')
("foo", STAG)
('>', '>')
("</foo>", ETAG)
I tried this (where <START_TAG>
is an exclusive state and st
is a global variable holding the element name, which is "foo"
in this example):
<START_TAG>{
"/>" { yytext = ">";
return(">");
yytext = strcat(strcat("<", st), ">");
yyval.strval = strdup(yytext);
yy_pop_state();
return(ETAG);
}
}
but it doesn't work.
Essentially I want the lexer to replace this token "/>"
with these two tokens: ">"
and "</foo>"
. How do I do that?