0

I'm doing an assignment for school right now that focuses on building a tokenizer and parser for simple sets of instructions. This instruction set uses EBNF grammar which we have to output to the user. As a simple example, imagine this instruction set code looks like this:

set 0, 2 * (4 + 20)
halt

And the EBNF grammar is given like this:

<Program>    -> <Statement> { <Statement> }
<Statement>  -> <Set> | 'halt'
<Set>        -> set ('write'|<Expr>), ('read'|<Expr>)
<Expr>       -> <Term> {(+|-) <Term>}
<Term>       -> <Factor> {(*|/|%) <Factor>}
<Factor>     -> <Number> | 'D['<Expr>']' | '('<Expr>')'
<Number>     -> 0 | (1...9){0...9}

In this case, <...> is what I output to the user, {...} means choose 0 or more, (...) means choose 1 or more and '...' is a string literal.

So in this case, when running the program, the correct output should look like this:

Program          
Statement        
Set              // set
Expr             // 0
Term             // 0 is a term
Factor           // 0 is a factor
Number           // 0 is finally a number
Expr             // go to the next string, "2*(4+2)"
Term             // start with the number 2
Factor           // 2 is a factor
Number           // 2 is a number
Factor           // after 2, we have a * so after that is a factor
Expr             // the factor leads to a (...) which leads to an expr
Term             // the expr leads to a term 4
Factor           // 4 is a factor
Number           // 4 is returned as a number
Term             // we have a + so the next string (20) is a term
Factor           // 20 is a factor
Number           // return 20 as a number
Statement        // halt is a statement, end

I've already built the tokenizer portion of the assignment, so running the above instruction set produces a std::vector<std::string> that looks like this (where each new string is separated by a new line):

set
0
2*(4+20)
halt

Now the parser is where I am stuck. I first push each string into a std::queue, examine the string, parse it, and then pop it from the queue. I already have it so that a set statement pushes the string "Set" in another std::vector<std::string> so I can print it out later.

I also have a function, parse_expression(std::string& str) to parse my expressions. The real trouble I have is with how I could correctly parse the 2 * (4 + 20) part of it.

My teacher told me to go character by character through the string, check whether it is a number (which I know how to do), or if it matches a '+', '-', '/', '*', '%' character, or if the next character is a 'D' then I should parse another expression.

I'm sort of confused on how I should attempt this though. If I go character by character, I can get the first number, go until I hit a non number character, and then just push out Term followed by Factor followed by Number. But how can I then go back to that original character and keep pushing onwards towards the next ones. Almost like, how do I go back to the original level I was at so that I can correctly determine whether or not I need to create another Expr, or whether or not I should just be doing another term.

I realize this is a long, confusing question, but I would appreciate any push in the right direction, whether it be something glaringly obvious that I'm doing wrong, or whether it be how a parser actually works.

Alex
  • 2,145
  • 6
  • 36
  • 72
  • 1
    Search e.g. Wikipedia for recursive descent parsing. The basic (no frills) technique is rather easy to apply. The best description I've seen is in Wirth's "Algorithms + Data Structures = Programs" (Sorry, haven't got it handy, and it was a long while back. Plus it is written in Pascal...). – vonbrand Oct 11 '15 at 23:16
  • 1
    See my SO answer on how to build a recursive descent parser: http://stackoverflow.com/questions/2245962/is-there-an-alternative-for-flex-bison-that-is-usable-on-8-bit-embedded-systems/2336769#2336769 – Ira Baxter Oct 11 '15 at 23:19
  • 1
    Part of your issue might be that "2*(4+20)" isn't a single token. It's actually 7: 2, *, (, 4, +, 30 and ) – Michael Albers Oct 11 '15 at 23:41
  • @MichaelAlbers Ah, that makes sense. But how would I handle correctly determining the type I am dealing with? Especially with expressions within the parentheses? – Alex Oct 12 '15 at 18:17
  • @Alex What do you mean by type? The data type of the result of the expression? – Michael Albers Oct 13 '15 at 01:39

0 Answers0