You definitely could do such a thing, but obviously it would destroy the intuitiveness of the source code. Imagine this:
if if == 1
As far as actually implementing it, the lexer wouldn't have to be changed at all. If the lexer matches "if" in the source it returns a token with an IF
type. Suppose we have the following assignment statement, where if
is a variable name and it's getting assigned the value 1.
if <- 1;
The lexer's token stream to be fed to the parser is:
IF, LARROW, INTLITERAL, SEMICOLON
I might have the following productions to describe an assignment statement (\w integer rvals):
assignStmt::= id:i LARROW intExpr:e SEMICOLON {: RESULT = new AssignmentStatement(i, e) :}
intExpr::= INTLITERAL:i {: RESULT = i.intVal; :}
id::= ID:i {: RESULT = i.strVal; :}
LARROW
, ID
, IF
, INTLITERAL
, and SEMICOLON
are terminals, which are tokens returned by the lexer, and assignStmt
, id
, and intExpr
are non-terminals. ID
represents an identifier (e.g. class/variable/method name).
After failing the production for an if statement, we'll eventually enter the first production for an assignment statement. We expand the id
non-terminal, whose only production is ID
, but the token I want to match is IF
, so the assignStmt
production fails altogether.
For my language to allow a variable to be named "if" all I have to do is:
assignStmt::= id:i LARROW intExpr:e SEMICOLON {: RESULT = new AssignmentStatement(i, e) :}
intExpr::= INTLITERAL:i {: RESULT = i.intVal; :}
id::= ID:i {: RESULT = i.strVal; :}
|IF {: RESULT = "if"; :}
Note that |
defines an alternate production for the non-terminal. Now we have that second production for the id
non-terminal, which matches the current token, and ultimately results in matching an assignment statement.
AssignmentStatement
is an AST node defined as follows:
class AssignmentStatement {
String varName;
int intVal;
AssignmentStatement(String s, int i){varName = s; intVal = i; }
}
Once the parser decides the source is syntactically correct, nothing else should be affected. The names of your variables shouldn't affect the latter stages of compilation, that is if you don't create conditions that would allow that to happen.