An interpreter is usually made in 3 steps:
Lexical
Recognize specific simple shemes, like keyword, numbers, symbols, etc. The standard way is to define a regular expression for each lexer. There are well defined algorithm there outside to transform a reg-exp in a state machine so you may recognize any work in your input.
This is made by:
- Create a state machine from each regexp
- Join all state machines
- Make the state machine deterministic
- Make the state machine minimal.
Syntax
In this part, you get the sequence of lexer as input and create a tree with them. Depending on the complexity of your language, there are different types: top->down or down->top. (speaking about LR, LL, LALR, etc.)
file
|...
|- c=a+b
| |- a+b
|- a
|- +
|- b
|- =
|-c
Semantic
Go trough your tree and make the operations, so for example in the a+b, you get the a, the b and you sum them, then you return to the above node, and set the value to c.
Final note:
Be careful to design a powerful error/warning mechanism from the starting: type of error, full description, line and char where the mistake is detected, level of the error/warning, etc.
Also, could be interesting to provide for each node the parsed input (string), the interpreted content (LEX_NUMBER) and the interpreted value (4).