Don't use .+ ';'
in this case: with that, you cannot make a distinction between a ';'
as the end of an SQL statement and one inside a string literal.
So make distinction between a SqlAndExecute
and SqlStatement
, you simply match what both tokens have in common, and then, at the end, change the type of the token like this:
Sql
: SELECT Space SqlAtom+ ( ';' {$type=SqlStatement;}
| EXECUTING {$type=SqlAndExecute;}
)
;
fragment SqlStatement : /* empty, used only for the token-type */ ;
fragment SqlAndExecute : /* empty, used only for the token-type */ ;
Now, an SqlAtom
is either a string literal, or, when there's not EXECUTING
ahead, any character other than a single quote ('\''
) or a semi colon (';'
). The "when there's not EXECUTING
ahead"-part must be handled by some manual extra look-ahead in the lexer and a semantic predicate.
A quick demo:
grammar T;
@lexer::members {
private boolean aheadIgnoreCase(String text) {
int i;
for(i = 0; i < text.length(); i++) {
String charAhead = String.valueOf((char)input.LA(i + 1));
if(!charAhead.equalsIgnoreCase(String.valueOf(text.charAt(i)))) {
return false;
}
}
// there can't be a letter after 'text', otherwise it would be an identifier
return !Character.isLetter((char)input.LA(i + 1));
}
}
parse
: (t=. {System.out.printf("\%-15s'\%s'\n", tokenNames[$t.type], $t.text);})* EOF
;
Sql
: SELECT SP SqlAtom+ ( ';' {$type=SqlStatement;}
| EXECUTING {$type=SqlAndExecute;}
)
;
Space
: SP+ {skip();}
;
Id
: ('a'..'z' | 'A'..'Z')+
;
fragment SqlAtom
: {!aheadIgnoreCase("executing")}?=> ~('\'' | ';')
| Str
;
fragment Str : '\'' ('\'\'' | ~('\'' | '\r' | '\n'))* '\'';
fragment SELECT : S E L E C T;
fragment EXECUTING : E X E C U T I N G;
fragment SP : ' ' | '\t' | '\r' | '\n';
fragment C : 'c' | 'C';
fragment E : 'e' | 'E';
fragment G : 'g' | 'G';
fragment I : 'i' | 'I';
fragment L : 'l' | 'L';
fragment N : 'n' | 'N';
fragment S : 's' | 'S';
fragment T : 't' | 'T';
fragment U : 'u' | 'U';
fragment X : 'x' | 'X';
fragment SqlStatement : ;
fragment SqlAndExecute : ;
And if you now parse the input:
Select bar from EXECUTINGIT EXECUTING
x
Select foo from EXECUTING
y
SELECT a FROM b WHERE c=';' and More;
the following will be printed to the console:
SqlAndExecute 'Select bar from EXECUTINGIT EXECUTING'
Id 'x'
SqlAndExecute 'Select foo from EXECUTING'
Id 'y'
SqlStatement 'SELECT a FROM b WHERE c=';' and More;'
EDIT
Note that the Sql
rule now always produces an SqlStatement
or SqlAndExecute
token. In other words: there will never be a Sql
token. If you want to match either a SqlStatement
or SqlAndExecute
, create a parser rule that matches one of them:
sql
: SqlStatement
| SqlAndExecute
;
and use sql
in your parser rule(s) instead of Sql
.