I am building a new simple programming language (just to learn how compilers work in my free time).
I have already built a lexer which can tokenize my source code into lexemes.
However, I am now stuck on how to form an Abstract Syntax Tree from the tokens, where the source code might contain an expression (with operator precedence).
For simplicity, I shall include only 4 basic operators: +, -, /, and * in addition to brackets (). Operator precedence will follow BODMAS rule.
I realize I might be able to convert the expression from infix to prefix/postfix, form the tree and substitute it.
However, I am not sure if that is possible. Even if it is possible, I am not sure how efficient it might be or how difficult it might be to implement.
Is there some trivial way to form the tree in-place without having to convert to prefix/postfix first?
I came across the Shunting Yard algorithm which seems to do this. However, I found it to be quite a complicated algorithm. Is there something simpler, or should I go ahead with implementing the Shunting Yard algorithm?
Currently, the following program is tokenized by my lexer as follows:
I am demonstrating using a Java program for syntax familiarity.
Source Program:
public class Hello
{
public static void main(String[] args)
{
int a = 5;
int b = 6;
int c = 7;
int r = a + b * c;
System.out.println(r);
}
}
Lexer output:
public
class
Hello
{
public
static
void
main
(
String
[
]
args
)
{
int
a
=
5
;
int
b
=
6
;
int
c
=
7
;
int
r
=
a
+
b
*
c
;
System
.
out
.
println
(
r
)
;
}
}