1

I have finished implementing Shunting-yard Algorithm, but have some questions:

  1. How does this Algorithm validate that the input is legal, in other words how could it detect if a++b is legal or not (sure it is not)

  2. What is the second step that I should do? Shunting-yard converts 1+2 to 1 2 +


Update regarding 1:

After some trying I think it does, For example take a++b it would be a+b+ then when I evaluate it I would take a then + but since I have only one variable in hand that's an error.

Is that always the case For non valid expressions?

  • 1. It doesn't; you need to add that yourself, 2. If your goal is to evaluate the expression, the next step is to evaluate it. – molbdnilo Aug 06 '20 at 15:26
  • @molbdnilo after some trying I think it does, are you sure I should check that manually? For example take a++b it would be a+b+ then when I evaluate it I would take a then + but since I have only one variable in hand that's an error –  Aug 06 '20 at 15:30
  • But I am not sure if it will catch all cases like that, may you kindly give me an example of a case that can't be caught –  Aug 06 '20 at 15:31
  • @danel, you've got it wrong...see the answer below – reyad Aug 06 '20 at 15:39
  • @daniel If you detect the error during evaluation, you did add that and the parser didn’t validate it. – molbdnilo Aug 06 '20 at 15:41
  • @daniel -- How would you use shunting yard for more complex things such as `sin(0.1) + sqrt(5)`, where say, you need to detect if `sqrt` is actually an available function? And what if `a++b` is a meaningful expression in another context? – PaulMcKenzie Aug 06 '20 at 15:49
  • @PaulMcKenzie: The parser doesn't need to tell you whether `sqrt` is an actual function because the syntax is unambiguously a function call. Like undefined variables, undefined functions are semantic errors which are caught at a different phase. (A different case is where your language allows `sin` and `sqrt` to be unary operators, allowing you to write `sin 0.1`. But then they're handled by the parser just like any other unary operator.) For how to parse function calls with SY, see https://stackoverflow.com/a/16392115/1566221 – rici Aug 06 '20 at 16:55

2 Answers2

3

1. Syntax errors

It depends on how, precisely, you implement the algorithm, but in the version usually found by an internet search there is no guarantee that an ungrammatical expression will be correctly rejected by the Shunting Yard Algorithm. Many incorrect expressions will produce incorrect postfix strings (as you note), or even correct postfix strings. In particular, if you have unary operators, the algorithm (as usually presented) cannot really differentiate between prefix use, where the operator precedes the operand, or postfix use, where the operator follows the operand.

This will be a serious problem if your target language has operators which can be used either as prefix or postfix operators, with different semantics (such as the C family's ++ and -- operators). Since the algorithm does not distinguish the two cases, the semantic difference is lost.

There's a similar, more common problem with operators which can be used either as binary infix operators or as prefix operators, such as the - operator. Unless the two uses are distinguished, the postfix output will not be interpretable, because when the - is reached, the evaluator cannot know whether it applies to one or two operands. (In addition, it is likely that the unary minus operator will have been processed with incorrect precedence, since the desired precedence of unary minus is higher than multiplication and division. However, for most arithmetic expressions, using the incorrect precedence will not change the numeric value of the result, since -(x * y) and (-x) * y have precisely the same value. Incorrect results will be evident if you implement a modulo operator.)

The Shunting Yard algorithm will detect unbalanced parentheses, because unbalanced parentheses will either result in the parse stack being overpopped or having too many values at the end of the parse.

It is relatively easy to augment the Shunting Yard algorithm with a very small state machine sufficient to classify different unambiguous use of operators with more than one syntactic significance; that state machine is also sufficient to detect other syntax errors mentioned above: operators being incorrectly placed, or missing altogether.

Because it is necessary in practical uses to correctly distinguish between unary and binary negation; the different meanings of prefix and postfix operators; and the different use of parentheses (grouping vs. function calls), production parsers using Shunting Yard will include some additional syntactic mechanism which will also detect syntax errors. An example of such an algorithm can be found in this answer.


2. RPN as an intermediate step

There is absolutely no need to use RPN as an intermediate result; the Shunting Yard algorithm can be used to

  • directly evaluate arithmetic expressions (if the expressions don't include conditional or looping constructs),

  • to output executable code for a stack machine compiler (or, with a bit more effort, three-address code for a more realistic machine), or more generally

  • to produce a syntax tree representing the parsed expression, which can be used for any of the above purposes and other semantic analysis tasks.

To produce the syntax tree, you need to push operands onto the parser stack, rather than directly outputting them to the output stream. Also, when you push an operator onto the stack, you actually push the syntax node representing the application of that operator: for a binary operator, it combines with the top two stack slots. (And for a unary operator, with the top stack slot.) If you want to use Shunting Yard as a direct evaluator, you use the same strategy but pushing the operator onto the stack causes the evaluation of that operator with its operands, identified in the same way.

The RPN intermediate representation really provides no value whatsoever. I have no idea why it is so popular.

rici
  • 234,347
  • 28
  • 237
  • 341
-1

let's see if I can break it down for you. Shunting yard algorithm does one of the following by breaking a infix notation:

  1. either produce a postfix notation string(also known as Reverse Polish notation)
  2. or an abstract syntax tree.

In you case, it's postfix notation.

[note: I hope you know about postfix notation, if not then, read this.]

Now, postfix notation makes it very easy to evaluate a mathematical expression. I'll show you how to evaluate a postfix notation:

In a simple description:
    (1) Infix: A + B
        Postfix: A B +
    (2) Infix: A + B * C
        Postfix: A B C * +
Let's observe the part A+B*C i.e. A+(B*C)

If we represent it in Postfix, it would be like: A B C * +(by applying shunting-yard)
Now, the algorithm to calculate it
    (1) Take a stack
    (2) When we see a number, we push it to stack
    (3) When we see a operator, we pop two numbers out of stack and calculate them with help of operator and push the result into stack again
    (4) We do it till the end
    (5) At last, only a number would be left in stack, that is our answer.
Let's visualise it:
    (1) [A]
    (2) [A, B]
    (3) [A, B, C]
    (4) [A, R1] where R1 = B*C
    (5) [R2] where R2 = A+R1

I hope, you have understood that, shunting yard will help you convert infix to postfix and then, you can easily evaluate the postfix notation.

Now, the question is how to detect a++b error:

Now, observe what happens for a, +, +, b tokens(as you've stated in the comment: a++b is tokenized a, +, +, b tokens):

I've taken the pseudocode from wikipedia(got lazy, didn't want to write myself):

else if the token is an operator then:
        while ((there is a operator at the top of the operator stack)
              and ((the operator at the top of the operator stack has greater precedence)
               or (the operator at the top of the operator stack has equal precedence and the token is left associative))
              and (the operator at the top of the operator stack is not a left parenthesis)):
            pop operators from the operator stack onto the output queue.
        push it onto the operator stack.

accroding to this: a, +, +, b would take the following form in output queue:

a, b, +, +

a, b, +, + is just simply wrong, cause, according to the postfix evaluation rules the following would occur:

1. [a] // output queue is now [b, +, +]
2. [a, b] // output queue is now [+, +]
3. [r1] // output queue is now [+]
4. error // cause: notice there's still one '+' operator left in queue
// but, only one number 'r1' left in the 'stack'
// so, error

I hope it is clear to you now...

reyad
  • 1,392
  • 2
  • 7
  • 12
  • 1
    Much thanks, but still I didn't get an answer to my first question, will it detect error Always? –  Aug 06 '20 at 15:42
  • "When we see a operator, we pop two numbers out of stack and calculate them with help of operator and push the result into stack again" what if there are no two numbers? that's an error but will that be always the case when an invalid expression is given? –  Aug 06 '20 at 15:43
  • @daniel, yes for course, if it is a binay operator and there's one number left in the stack, that'll obviously be an error – reyad Aug 06 '20 at 15:44
  • ok and could you give me an example when it doesn't detect errors? –  Aug 06 '20 at 15:45
  • well, you've to perform error detection when, you're implementing shunting-yard, when you're detecting tokens, if you get any token wrong, then stop parsing and throw error – reyad Aug 06 '20 at 15:47
  • and for detecting symantic error, like I've given told in the previous comment, in stack contains only one number...or stack does not contain proper number type(like double, but you're calculating int etc...)...throw error – reyad Aug 06 '20 at 15:49
  • But cases like a++b should be left to the reverse Polish notation to check right? –  Aug 06 '20 at 15:49
  • no, it won't, its the duty what you've to perform, while you're reading or parsing tokens(in shunting-yard), cause `a++b` would come to you as a token, but is it a valid token name(obviously not)...so, you've to preform check on token names, if it is not a perfect token name, you stop parsing and output error... – reyad Aug 06 '20 at 15:51
  • you can do as follows: String token = get_token(); if(check_token_name(token) == invalid) { throw InvalidTokenNameError; }. design it in your language(whatever language you're using). – reyad Aug 06 '20 at 15:53
  • in my code if input is a++b then the tokens are a + + b (separated) Plus why it won't detect error? I explained up how it will.... –  Aug 06 '20 at 15:55
  • I am editing the answer, wait a few more minutes to know how to detect the error, in postfix evaluation time – reyad Aug 06 '20 at 15:57
  • @daniel, sorry for late reply...got stuck into some task, I've explained how you may detect the error of a++b when you're evaluating postfix – reyad Aug 06 '20 at 16:28
  • Detecting it during evaluation is generally too late. You need to detect the error at compile time when parsing the expression. – user207421 Oct 23 '20 at 00:50