0

I am working on a flex lexer for a language and suddenly discovered a problem that had never bothered me before: integer number recognition. I use a simple RE with the trailing context (as suggested in https://stackoverflow.com/a/22413732/4492932):

%%
[[:digit:]]+/[^[:alpha:]] { printf("-%s-\n", yytext);}
. {}
\n {}
%%

The program works as designed when the input string consists only of digits. When the supposed number has an illegal letter "tail," the last digit of the number is not included in yytext:

1234
-1234-
1234abcd
-123-

Why is "4" not matched?

DYZ
  • 55,249
  • 10
  • 64
  • 93
  • 1
    You've told lex that `1234` is not ok because it is followed by a letter `a`. On the other hand, `123` is followed by `4` so it is allowed. – Piotr Siupa Oct 20 '22 at 05:52
  • When your parser encounters `1234abcd`, what do you want it to do? – Piotr Siupa Oct 20 '22 at 05:55
  • You seem to have copied that pattern from an answer without actually reading the answer, which says immediately after the pattern that you will also need another rule. – rici Oct 20 '22 at 06:21
  • 1
    To be honest, I've never liked that restrictive approach to lexing integers. In most languages, if the lexer analyses `42skidoo` as two tokens, the result will be tossed out as a syntax error, so it's really only useful in the case of languages in which a number might be immediately followed by an identifier. – rici Oct 20 '22 at 07:54
  • @rici, I take your last comment as the answer I am looking for. Thanks. – DYZ Oct 20 '22 at 14:08
  • @PiotrSiupa, I thought I wanted the scanner to report an invalid token. But I prefer rici's approach of shifting the burden onto the parser. – DYZ Oct 20 '22 at 14:10

0 Answers0