I have a grammar that works, except the keywords must be upper case. Is there a way to shotgun all the keywords such that lower case equivalents will not be rejected? If not, how do I affect each of them individually?
-
2https://gist.github.com/sharwell/9424666 ∆ https://stackoverflow.com/questions/1844562/how-to-match-a-string-but-case-insensitively ∆ both links come from a googke search for `antlr case insensitive keywords` (2nd and 3rd hit, respectively) – rici Dec 14 '17 at 18:25
-
gist.github.com/sharwell/9424666 uses deprecated code not compatible with ANTLR4 runtime 4.7. – nicomp Dec 14 '17 at 19:19
-
1if we're lucky [@Sam Harwell](https://stackoverflow.com/users/138304/sam-harwell) will come by and fix it :) – rici Dec 14 '17 at 20:48
-
I am using it, but the deprecated warning haunts me. – nicomp Dec 14 '17 at 20:53
1 Answers
I don't recommend input streams that convert case to make the keyword recognition case-insensitive. Such a stream will convert everything, strings, comments etc. even though that is a total waste of CPU cycles. A better approach is to tell explicitly in your grammar that you want (only) certain keywords to be case sensitive. The grammar is trivial:
fragment A: [aA];
fragment B: [bB];
...
fragment Z: [zZ];
KEYWORD1: K E Y W O R D '1';
...
The ATN for these rules is only marginally more complex (using 2 intervals instead of one for each letter, which is (in total) faster than a case conversion):
and as example the letter S:
Each node is a step the ATN simulator has to walk to parse a rule. Edge labels are symbols to match to allow this transition (with ɛ being the epsilon transition, i.e. an unconditional step without input consumption).

- 48,925
- 16
- 119
- 181
-
`using 2 intervals instead of one for each letter, which is much faster than a case conversion`. Did you make performance benchmarks for approving? – Ivan Kochurkin Dec 16 '17 at 13:44
-
No, because I think it's obvious. The actual interval test and a case lookup are probably not very different. But that's not all. Then you have to add all the case conversions such a special stream does and that can sum up. However, for small grammars we are probably talking about milliseconds here, still. Now when you consider also the obscurity to make a grammar work by requiring it to have a special input stream and showing no indication that it is case-insensitive, makes enough points for me to believe that the stream approach is the worse idea. Why using the inferior solution? – Mike Lischke Dec 16 '17 at 15:43
-
Ok, I'm going to make performance tests by myself in order to make a final conclusion. – Ivan Kochurkin Dec 17 '17 at 12:09
-
Keep in mind it's not only about performance. I hoped to make that clear with my comment. – Mike Lischke Dec 17 '17 at 12:53
-
Maybe later we'll revert back fragment rules in SQL grammars (T-SQL, PL/SQL, MySql). – Ivan Kochurkin Dec 17 '17 at 13:12