40

I needed some help with creating custom trees given an arithmetic expression. Say, for example, you input this arithmetic expression:

(5+2)*7

The result tree should look like:

    *
   / \
  +   7
 / \
5   2

I have some custom classes to represent the different types of nodes, i.e. PlusOp, LeafInt, etc. I don't need to evaluate the expression, just create the tree, so I can perform other functions on it later. Additionally, the negative operator '-' can only have one child, and to represent '5-2', you must input it as 5 + (-2).

Some validation on the expression would be required to ensure each type of operator has the correct the no. of arguments/children, each opening bracket is accompanied by a closing bracket.

Also, I should probably mention my friend has already written code which converts the input string into a stack of tokens, if that's going to be helpful for this.

I'd appreciate any help at all. Thanks :)

(I read that you can write a grammar and use antlr/JavaCC, etc. to create the parse tree, but I'm not familiar with these tools or with writing grammars, so if that's your solution, I'd be grateful if you could provide some helpful tutorials/links for them.)

ChocolateBear
  • 415
  • 1
  • 4
  • 7
  • I should add that I'll also be doing a similar thing for logical expressions (e.g. ¬A V B). – ChocolateBear Jan 04 '11 at 01:43
  • See my SO answer on how to build a recursive descent parser, which is really easy for expressions. That answer links to a second, which shows how to build trees with such a parser. http://stackoverflow.com/questions/2245962/is-there-an-alternative-for-flex-bison-that-is-usable-on-8-bit-embedded-systems/2336769#2336769 – Ira Baxter Apr 14 '16 at 21:52

5 Answers5

56

Assuming this is some kind of homework and you want to do it yourself..

I did this once, you need a stack

So what you do for the example is:

    parse    what to do?                Stack looks like
      (      push it onto the stack     (
      5      push 5                     (, 5
      +      push +                     (, 5, +
      2      push 2                     (, 5, +, 2
      )      evaluate until (           7            
      *      push *                     7, *
      7      push 7                     +7, *, 7
      eof    evaluate until top         49

The symbols like "5" or "+" can just be stored as strings or simple objects, or you could store the + as a +() object without setting the values and set them when you are evaluating.

I assume this also requires an order of precedence, so I'll describe how that works.

in the case of: 5 + 2 * 7

you have to push 5 push + push 2 next op is higher precedence so you push it as well, then push 7. When you encounter either a ) or the end of file or an operator with lower or equal precedence you start calculating the stack to the previous ( or the beginning of the file.

Because your stack now contains 5 + 2 * 7, when you evaluate it you pop the 2 * 7 first and push the resulting *(2,7) node onto the stack, then once more you evaluate the top three things on the stack (5 + *node) so the tree comes out correct.

If it was ordered the other way: 5 * 2 + 7, you would push until you got to a stack with "5 * 2" then you would hit the lower precedence + which means evaluate what you've got now. You'd evaluate the 5 * 2 into a *node and push it, then you'd continue by pushing the + and 3 so you had *node + 7, at which point you'd evaluate that.

This means you have a "highest current precedence" variable that is storing a 1 when you push a +/-, a 2 when you push a * or / and a 3 for "^". This way you can just test the variable to see if your next operator's precedence is < = your current precedence.

if ")" is considered priority 4 you can treat it as other operators except that it removes the matching "(", a lower priority would not.

cbare
  • 12,060
  • 8
  • 56
  • 63
Bill K
  • 62,186
  • 18
  • 105
  • 157
  • 4
    Isn't it just Edsger Dijkstra's "Shunting Yard" algorithm? ( http://en.wikipedia.org/wiki/Shunting-yard_algorithm ) – SasQ Feb 25 '12 at 21:24
  • 8
    @SasQ Better to try to explain something than pass a link to it--teaches both you and them. Also, I never saw this as an algorithm, someone mentioned to me that calculations can be done in a tree and I did it--didn't know the name to look up (although I knew it had been done repeatedly, it's not complicated--which is why I answered with it) – Bill K Feb 27 '12 at 17:24
  • 2
    @Bill K: I don't depreciate your effort in explaining. It's good. I only paste a link as a reference, if someone would want to check out more. Yes, it's better to explain than pass a link, but if it's explained already in the linked article, it's better to pass the link and save time instead of reinventing the wheel. There is too much of copies of the same knowledge repeated over the Net. – SasQ Feb 28 '12 at 02:38
  • 1
    @Bill K: “When you encounter either a ) or the end of file or an operator with lower or equal precedence you start calculating the stack…” What's for 1 + 3 * 2 * 4? When I meet second ‘*’ I have to calculate the whole stck, i.e. 1+3*2 which is 7. This leads to incorrect answer (7 * 4 =) 28. Whereas correct answer is 25 (= 1 + 24). – Artem Pelenitsyn Apr 09 '12 at 09:16
  • 1
    @ArtemPelenitsyn * is higher than +, so you have to stop evaluating when you hit + in the stack, not do the entire stack. I think you are right, that should be explicitly stated that you stop if you hit a lower precedence operator in the stack – Bill K Apr 10 '12 at 02:34
  • @SasQ I think Passing a link and plus explaining makes a good combination, learning for yourself and for others. +1 for Bill K – DayTimeCoder Apr 18 '12 at 08:46
  • @BillK I understood how this works for calculating. But how do you get the "result tree" OP was asking for? Seems to like the tree will be built from the leaves to the top, isn't it? – Fitz Jul 26 '16 at 12:56
  • @Fitz after 5 years of this post being here I've never noticed that (And I've revisited it multiple times!). At some point while writing up the answer I switched gears a little. I started off right, this is how you build a tree--if you don't do the actual math of an operation but instead create a node , but then I went off and described a calculation instead. Still a few people who came to this question seem to appreciate it so I'll leave it. Would it be valuable for me to amend the answer to have it create a tree instead, or is it obvious enough? – Bill K Jul 26 '16 at 16:23
  • @BillK Sorry to bring you 5 years back in time! I just found this post searching for the same answers as OP. Well, it seems to me that if we proceed the same way as you described it, we will create the leaves nodes first and then add their parents climing up the tree as we go further. Meaning we need to have nodes with a link to the father. At least, this is what I understood. I asked because I am unsure of my comprehension of this solution as an application to the drawing of the tree (and because I actually need to draw a tree). – Fitz Jul 26 '16 at 22:54
  • @Billk And yes, this answer is very valuable as it seems to me to be an overkill to use ANTLR in easy cases like those (eventhought, I won't have to do only calculations but retrieve datas from a database before calculating them, just the action slighly changes). – Fitz Jul 26 '16 at 22:57
  • @fitz I was working on a tree description like this one, it is more difficult to describe clearly as a tree. It might be easiest to convert it to a stack then a tree, not sure. I'll add an addendum if I get some time later... Or maybe an entirely new answer. – Bill K Jul 27 '16 at 08:33
  • @BillK I understand it now. You use the Shunting Yard algorithm to build the queue and then build the tree out of it following [this](http://stackoverflow.com/questions/7866587/create-a-binary-tree-from-an-algebraic-expression). – Fitz Jul 28 '16 at 08:57
13

I wanted to respond to Bill K.'s answer, but I lack the reputation to add a comment there (that's really where this answer belongs). You can think of this as a addendum to Bill K.'s answer, because his was a little incomplete. The missing consideration is operator associativity; namely, how to parse expressions like:

49 / 7 / 7

Depending on whether division is left or right associative, the answer is:

49 / (7 / 7) => 49 / 1 => 49

or

(49 / 7) / 7 => 7 / 7 => 1

Typically, division and subtraction are considered to be left associative (i.e. case two, above), while exponentiation is right associative. Thus, when you run into a series of operators with equal precedence, you want to parse them in order if they are left associative or in reverse order if right associative. This just determines whether you are pushing or popping to the stack, so it doesn't overcomplicate the given algorithm, it just adds cases for when successive operators are of equal precedence (i.e. evaluate stack if left associative, push onto stack if right associative).

Ray Weidner
  • 131
  • 1
  • 3
  • You should not add comments as answer. Try to make it a stand alone answer or get some rep and add a comment. – bummi Sep 05 '13 at 07:56
10

The "Five minute introduction to ANTLR" includes an arithmetic grammar example. It's worth checking out, especially since antlr is open source (BSD license).

Bill K
  • 62,186
  • 18
  • 105
  • 157
Cameron Skinner
  • 51,692
  • 2
  • 65
  • 86
  • Thanks for the link. Although, for the time being, I'll be trying the stack solution that Bill posted, because a part of me would like to do it all myself, but if I come back to ANTLR, I would definitely use this link, as it seems to be the most helpful introduction for a novice :) – ChocolateBear Jan 04 '11 at 15:03
  • 2
    @CameronSkinner, it is to be noted that ANTLR cannot be considered to be open source project, as an open source project should also open source the documentation, (which is a part of an project), but documentation for ANTLR is not free, so it is a free software, but not open source [from wikipedia](http://en.wikipedia.org/wiki/Antlr) – Avinash R Jan 02 '13 at 08:09
3

Several options for you:

  1. Re-use an existing expression parser. That would work if you are flexible on syntax and semantics. A good one that I recommend is the unified expression language built into Java (initially for use in JSP and JSF files).

  2. Write your own parser from scratch. There is a well-defined way to write a parser that takes into account operator precedence, etc. Describing exactly how that's done is outside the scope of this answer. If you go this route, find yourself a good book on compiler design. Language parsing theory is going to be covered in the first few chapters. Typically, expression parsing is one of the examples.

  3. Use JavaCC or ANTLR to generate lexer and parser. I prefer JavaCC, but to each their own. Just google "javacc samples" or "antlr samples". You will find plenty.

Between 2 and 3, I highly recommend 3 even if you have to learn new technology. There is a reason that parser generators have been created.

Also note that creating a parser that can handle malformed input (not just fail with parse exception) is significantly more complicated that writing a parser that only accepts valid input. You basically have to write a grammar that spells out the various common syntax errors.

Update: Here is an example of an expression language parser that I wrote using JavaCC. The syntax is loosely based on the unified expression language. It should give you a pretty good idea of what you are up against.

Contents of org.eclipse.sapphire/plugins/org.eclipse.sapphire.modeling/src/org/eclipse/sapphire/modeling/el/parser/internal/ExpressionLanguageParser.jj

Sumit Singh
  • 15,743
  • 6
  • 59
  • 89
Konstantin Komissarchik
  • 28,879
  • 6
  • 61
  • 61
  • Thanks for providing all the options available to me - really helps when making a final decision. I'll be trying the stack solution that Bill posted first I think, because it seems like I can be more flexible with it and also because a part of me would like to do it all myself, but if it doesn't work out or if I realise I must use parser generator, I'll definitely look into JavaCC. Thanks a lot :) – ChocolateBear Jan 04 '11 at 15:05
1

the given expression (5+2)*7 we can take as infix

Infix  :     (5+2)*7
Prefix :     *+527

from the above we know the preorder and inorder taversal of tree ... and we can easily construct tree from this. Thanks,

knils
  • 115
  • 1
  • 9
  • 1
    [link](http://stackoverflow.com/questions/1946896/conversion-from-infix-to-prefix) for conversion tips. – Fitz Jul 26 '16 at 23:22