I'm writing a very basic web server that has to support an extremely limited special server side scripting language. Basically all I need to support is "echo", addition/subtraction/multiplication (no division) with only 2 operands, a simple "date()" function that outputs the date and the use of the "&" operator to concatenate strings.
An example could be:
echo "Here is the date: " & date();
echo "9 x 15 = : & 9*15;
I've gone through and created the code necessary to generate tokens, but I'm not sure I'm using the right tokens.
I created tokens for the following:
ECHO - The echo command
WHITESPACE - Any whitespace
STRING - A string inside quotations
DATE - The date() function
CONCAT - the & operator for concatenation
MATH - Any instance of binary operation (5+4, 9*2, 8-2, etc)
TERM - The terminal character (;)
The MATH one I am particularly unsure about. Typically I see people create a token specifically for integers and then for each operator as well, but since I ONLY want to allow binary operations, I thought it made sense to group it into one token. If I were to do everything separately, I would have to do some extra work to ensure that I never accepted "5+4+1".
So question 1 is am I on the right track with which tokens to use?
My next question is what to I do with these tokens next to ensure correct syntax? The approach that I had thought of was to basically say, "Okay I know I have this token, here is a list of tokens that are allowed to come next based on the current token. Is the next token in the list?"
Based on that, I made a list of all of my tokens as well as what tokens are valid to appear directly after them (didn't include whitespace for simplicity).
ECHO -> STRING|MATH|DATE
STRING -> TERM|CONCAT
MATH -> TERM|CONCAT
DATE -> TERM|CONCAT
CONCAT -> STRING|MATH|DATE
The problem is I'm not sure at all how to best implement this. Really I need to keep track of whitespace as well to make sure there are spaces between the tokens. But that means I have to look ahead two tokens at a time which is getting even more intimidating. I also am not sure how to manage the "valid next tokens" stuff without just some disgusting section of if blocks. Should I be checking for valid syntax before trying to actually execute the script, or should I do it all at once and just throw an error when I reach an unexpected token? In this simple example, everything will always work just fine parsing left to right, there's no real precedence rules (except the MATH thing, but that's part of why I combined it into one token even though it feels wrong.) Even so, I wouldn't mind designing a more scalable and elegant solution.
In my research about writing parsers, I see a lot of references to creating "accept()" and "expect()" functions but I can't find any clear description of what they are supposed to do or how they are supposed to work.
I guess I'm just not sure how to implement this, and then how to actually come up with a resulting string at the end of the day.
Am I heading in the right direction and does anybody know of a resource that might help me understand how to best implement something simple like this? I am required to do it by hand and cannot use a tool like ANTLR.
Thanks in advance for any help.