0

When having multiple lexers one sees that, especially, in the pattern part some definitions are repeated in each lexer (e.g. whiteSpace [ \t]+ ), this is not nice that one has to do define it each time and especially with more complex patterns a bit error prone.

So far I have not been able to fine anything, but is there a way to have a file with (e.g.) patterns included in a lexer?

albert
  • 8,285
  • 3
  • 19
  • 32

1 Answers1

1

You are free to write your own preprocessor, and I suspect many people have done so. But as far as I know, no popular lex derivative includes such a feature. Certainly, neither flex nor the original AT&T lex have one.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Thanks for the answer. That is a big and unfortunate pity, the extra preprocessor would be a possibility. – albert Feb 14 '21 at 20:48
  • @albert: By the way, you have no need to define `whitespace`. You can use `[[:space:]]` in order to indicate "any whitespace character". There are a bunch of these Posix character classes available (also in C regexes, grep, and other Posix environments); see the flex manual for a list. (Basically, if `isxxxxx()` is in ``, then `[:xxxxx:]` is valid inside character class brackets.) – rici Feb 14 '21 at 21:51
  • Thank, I just used the `whitespace` as an example, to illustrate my question. – albert Feb 15 '21 at 08:56
  • @albert: yeah, I got that. But it seems to be a very common use case so knowing the technique seems useful to me. It definitely can aid readability. – rici Feb 15 '21 at 14:08
  • Note that the manual also states: "A word of caution. Character classes are expanded immediately when seen in the flex input. This means the character classes are sensitive to the locale in which flex is executed, and the resulting scanner will not be sensitive to the runtime locale. This may or may not be desirable. " ( https://westes.github.io/flex/manual/Patterns.html#Patterns). – albert Mar 09 '21 at 13:31
  • @albert: Yes, it does say that and it's true. There was a time when that warning would have been important, but these days it's pretty much an anachronism. The key is that flex does not handle wide characters (nor multibyte character sequences). So the `locale` classes only refer to the single-byte encoding in the locale. In a UTF-8 locale (which is basically all of them these days), the single-byte characters are precisely the low ASCII characters and so the single-byte classes in a UTF-8 locale are the same as the C/Posix locale. – rici Mar 09 '21 at 15:05
  • If you happen to be coding on a legacy system using a locale like ISO-8859-7 and you didn't want to generate a lexer conforming to that locale, you would want to run flex in the C locale. But I venture to say that the scenario is very uncommon these days. – rici Mar 09 '21 at 15:06