Algebraic Simplification on Abstract Syntax Tree

Question

I've designed a parser in C that is able to generate AST, but when I begin to implement simplifications it really got messed up. I've successfully implemented rules for the summation below;

x + 0 -> x

x + x -> 2 * x

etc.

But it took huge amount of effort and code to do it. What I did was to search entire tree and try to find a pattern that I can use (lots of recursion) then if there was a cascade of PLUS nodes, I've added them to a list, then worked on that list (summing numbers and combining variables etc.) then I created another tree from that list, and merged it to existing one. It was this paper I used to implement it. In short given the expression 2*x+1+1+x+0 I got 3*x+2. And it was just summation that got me into so much trouble, I can even imagine the advanced stuff. So I realized I was ding something wrong.

I've read this thread but I'm really confused about term rewriting systems (what it really is, how to implement in C).

Is there a more general and effective way to do simplification on AST? Or how to write a term rewriting system in C

Interesting anecdote, no, really :) There's no question here, though. -- well, there is, I suppose: how to write a term rewriting system in c, but that is way broad for an SO question. — 500 - Internal Server Error, Sep 20 '19 at 18:01
Yeah, sorry. It should be is there a more general and effective way to do it? or how to write a term rewriting system in C — Arif Balik, Sep 20 '19 at 18:03
You can get a term rewriting engine and implement your algrebraic simplifications directly. See http://www.semdesigns.com/Products/DMS/SimpleDMSDomainExample.html for a fully worked example. *Writing* a term rewriting engine is more complex than you would think, especially if you want to handle commutativity and associativ properly. — Ira Baxter, Nov 06 '19 at 21:09

Laurence R. Ugalde · Answer 1 · 2019-10-15T17:12:32.643

Term rewriting is (in simple words) like the 2 examples you provided. (How to convert x + 0 to x in a AST?). It is about pattern matching on AST's, and once there is a match, a conversion of an equivalent expression. It is also called a term rewriting rule.

Note that having a term rewriting rule is not the absolute or general solution of algebraic simplification. The general solution involves having many rewriting rules (you showed two of them), and apply them in a given AST repeatedly until no one success.

Then, the general solution involves the process or coordination on the application of the rewriting rules. i.e. in order to avoid the re-application of a rule that has previously failed, as an example.

There is not a unique way to do it. There are several systems. For proprietary systems it is not known because they keeps it in secrecy, but there are open source systems too, for example Mathomatica is written in C.

I recommend you to check the open system Fōrmulæ. In this, the process of coordination of rewriting rules (which is called "the reduction engine") is relatively simple. It is written in Java. The advantage of this system is that rewriting rules are not hardwired/hardcoded in the system or the reduction engine (they are hot pluggable). Coding a rewriting rule involves the process of pattern matching and conversion, but no when or how it will be called (it follows the Hollywood principle).

In the specific case of Fōrmulæ:

The reduction engine is based (in general terms) on the post-order tree traversal algorithm. so when a node is "visited", its sub-nodes were already visited and (possibly) transformed, but it is possible to alter such that flow (i.e. to solve the unwanted referentiation of a variable in an assignment x <- 5). Note that it is not just a tree traversal, the AST is being actually changed in the process.
In order to efficiently manage the (possibly hundred or thousand) of rewriting rules, every rule has a type of expression where it is applicable, and when a single node is "visited", only the associated rules are checked for a match. For example, your 2 rules can only be applied to "addition" nodes of an AST.

Rewriting rules are not limited to algebraic simplification, they can be used in many other fields such as programming (Fōrmulæ is also its programming language, see examples of Fōrmulæ programs, or in automatic or assisted theorem proving.

What overly simple rewriting engines like "Formulae" get wrong is they fail to handle associative and commutative rewrites properly. Consider the formula "x + 3 - x" which should "rewrite" to just "3", If you represent this as a tree, and pattern match using a tree pattern for "a + - a -> 0", the pattern won't match. (Maybe Formulae implements this, but it isn't obvious from your description, and "relatively simple" suggests this capabiliy isn't there, because it isn't relatively simple to implement.) — Ira Baxter, Nov 06 '19 at 21:12
Yes, one can use rewriting for a lot more than just "formulas". We use it to implement massive transformations on source code for software systems, including nasty languages such as C++. See http://www.semdesigns.com/Products/DMS/DMSToolkit.html. It is worth noting that you need a LOT more than just the rewriting part to do this, see: https://www.semanticdesigns.com/Products/DMS/LifeAfterParsing.html. Even parsing is hard for most standard langauges, see: https://stackoverflow.com/questions/243383/why-cant-c-be-parsed-with-a-lr1-parser/1004737#1004737 — Ira Baxter, Nov 06 '19 at 21:16

Algebraic Simplification on Abstract Syntax Tree

1 Answers1