The keywords are %option reentrant
or %option c++
.
As an example here's the ncr2a
scanner:
/** ncr2a_lex.l: Replace all NCRs by corresponding printable ASCII characters. */
%%
&#(1([01][0-9]|2[0-6])|3[2-9]|[4-9][0-9]); { /* accept 32..126 */
/** `+2` skips '&#', `atoi()` ignores ';' at the end */
fputc(atoi(yytext + 2), yyout); /* non-recursive version */
}
The scanner code can be left unchanged.
Here the program that uses it:
/** ncr2a.c */
#include "ncr2a_lex.h"
typedef struct {
int i,j; /** put here whatever you need to keep extra state */
} State;
int main () {
yyscan_t scanner;
State my_custom_data = {0,0};
yylex_init(&scanner);
yyset_extra(&my_custom_data, scanner);
yylex(scanner);
yylex_destroy(scanner);
return 0;
}
To build ncr2a
executable:
flex -R -oncr2a_lex.c --header-file=ncr2a_lex.h ncr2a_lex.l
cc -c -o ncr2a_lex.o ncr2a_lex.c
cc -o ncr2a ncr2a_lex.o ncr2a.c -lfl
Example
$ echo 'three colons :::' | ./ncr2a
three colons :::
This example uses stdin/stdout as input/output and it calls yylex()
once.
To read from a file:
yyin = fopen("input.txt", "r" );
@Loki Astari's answer shows how to read from a string (buffer = yy_scan_string(text, scanner); yy_switch_to_buffer(buffer, scanner)
)
.
To call yylex()
once for each token add return
inside rule definitions that yield full token in the *.l
file.