0

I need to do a lexical analysis of a very simple program in a file as part of a project. It was suggested to me to use tokenization to divide the lexical elements of the program. I've never used this technique and don't know how to implement it. Tokens can be:

  • a keyword like: IF, WHILE, ADD, SUB, SET, TRUE, FALSE etc ...
  • a parenthesis (open or closed)
  • a number
  • a variable

An example of an inbound program is:

(BLOCK (SET n 10) (SET sum 0) (WHILE (GT n 0) (BLOCK (SET sum (ADD sum n)) (SET n (SUB n 1)) (PRINT sum))))

How do I use tokenization to recognize and divide these program elements?

  • See if this helps: https://stackoverflow.com/questions/43067869/lexical-analyser-in-java – pringi Feb 11 '22 at 15:38
  • [Tokenization](https://en.wikipedia.org/wiki/Lexical_analysis#Tokenization) – David Conrad Feb 11 '22 at 16:27
  • For this case it looks like a simple FSM that tracks whether it is in a run of alphanumeric characters would suffix. Emit a token for each paren and each such run. – David Conrad Feb 11 '22 at 16:29

0 Answers0