0

I have a grammar that works, except the keywords must be upper case. Is there a way to shotgun all the keywords such that lower case equivalents will not be rejected? If not, how do I affect each of them individually?

nicomp
  • 4,344
  • 4
  • 27
  • 60
  • 2
    https://gist.github.com/sharwell/9424666 ∆ https://stackoverflow.com/questions/1844562/how-to-match-a-string-but-case-insensitively ∆ both links come from a googke search for `antlr case insensitive keywords` (2nd and 3rd hit, respectively) – rici Dec 14 '17 at 18:25
  • gist.github.com/sharwell/9424666 uses deprecated code not compatible with ANTLR4 runtime 4.7. – nicomp Dec 14 '17 at 19:19
  • 1
    if we're lucky [@Sam Harwell](https://stackoverflow.com/users/138304/sam-harwell) will come by and fix it :) – rici Dec 14 '17 at 20:48
  • I am using it, but the deprecated warning haunts me. – nicomp Dec 14 '17 at 20:53

1 Answers1

2

I don't recommend input streams that convert case to make the keyword recognition case-insensitive. Such a stream will convert everything, strings, comments etc. even though that is a total waste of CPU cycles. A better approach is to tell explicitly in your grammar that you want (only) certain keywords to be case sensitive. The grammar is trivial:

fragment A: [aA];
fragment B: [bB];
...
fragment Z: [zZ];

KEYWORD1: K E Y W O R D '1';
...

The ATN for these rules is only marginally more complex (using 2 intervals instead of one for each letter, which is (in total) faster than a case conversion):

enter image description here

and as example the letter S:

enter image description here

Each node is a step the ATN simulator has to walk to parse a rule. Edge labels are symbols to match to allow this transition (with ɛ being the epsilon transition, i.e. an unconditional step without input consumption).

Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
  • `using 2 intervals instead of one for each letter, which is much faster than a case conversion`. Did you make performance benchmarks for approving? – Ivan Kochurkin Dec 16 '17 at 13:44
  • No, because I think it's obvious. The actual interval test and a case lookup are probably not very different. But that's not all. Then you have to add all the case conversions such a special stream does and that can sum up. However, for small grammars we are probably talking about milliseconds here, still. Now when you consider also the obscurity to make a grammar work by requiring it to have a special input stream and showing no indication that it is case-insensitive, makes enough points for me to believe that the stream approach is the worse idea. Why using the inferior solution? – Mike Lischke Dec 16 '17 at 15:43
  • Ok, I'm going to make performance tests by myself in order to make a final conclusion. – Ivan Kochurkin Dec 17 '17 at 12:09
  • Keep in mind it's not only about performance. I hoped to make that clear with my comment. – Mike Lischke Dec 17 '17 at 12:53
  • Maybe later we'll revert back fragment rules in SQL grammars (T-SQL, PL/SQL, MySql). – Ivan Kochurkin Dec 17 '17 at 13:12