I need to do a lexical analysis of a very simple program in a file as part of a project. It was suggested to me to use tokenization to divide the lexical elements of the program. I've never used this technique and don't know how to implement it. Tokens can be:
- a keyword like: IF, WHILE, ADD, SUB, SET, TRUE, FALSE etc ...
- a parenthesis (open or closed)
- a number
- a variable
An example of an inbound program is:
(BLOCK (SET n 10) (SET sum 0) (WHILE (GT n 0) (BLOCK (SET sum (ADD sum n)) (SET n (SUB n 1)) (PRINT sum))))
How do I use tokenization to recognize and divide these program elements?