14

I'm creating a CAS (Computer Algebra System) in PHP, but I'm stuck right now. I am using this website.

Now I wrote a tokenizer. It will convert an equation like this:

1+2x-3*(4-5*(3x))

to this:

NUMBER PLUS_OPERATOR NUMBER VAR[X] MINUS_OPERATOR NUMBER MULTIPLY_OPERATOR GROUP

(where group is another set of tokens). How can I simplify this equation? Yeah, I know what you can do: adding X-vars, but they are in the sub-group. What is the best method I can use for handling those tokens?

Matt Ball
  • 354,903
  • 100
  • 647
  • 710
www.data-blogger.com
  • 4,076
  • 7
  • 43
  • 65
  • Your VAR token has an associated string. Why doesn't your NUMBER token have an associated number? – Ira Baxter Mar 23 '11 at 22:54
  • Depending on your constraints, why not try building an [OOP Interpreter](http://sourcemaking.com/design_patterns/interpreter)? It should be easier than dealing with tokens, and the tree should represent itself. It should be fairly easy to work with as well – ircmaxell Mar 23 '11 at 23:03
  • @David Heffernan: One advantage of PHP in dealing with expressions and programming languages is the sigil variables. You can name your variables `$operator` and `$var` without having to worry about conflicting with keywords in the programming language. – Joey Adams Mar 23 '11 at 23:04

3 Answers3

21

A really useful next step would be to construct a parse tree:

enter image description here

You'd make one of these by writing an infix parser. You could either do this by writing a simple recursive descent parser, or by bringing in the big guns and using a parser generator. In either case, it helps to construct a formal grammar:

expression: additive

additive: multiplicative ([+-] multiplicative)*

multiplicative: primary ('*' primary)*

primary: variable
       | number
       | '(' expression ')'

Note that this grammar does not handle the 2x syntax, but it should be easy to add.

Notice the clever use of recursion in the grammar rules. primary only captures variables, numbers, and parenthesized expressions, and stops when it runs into an operator. multiplicative parses one or more primary expressions delimited by * signs, but stops when it runs into a + or - sign. additive parses one or more multiplicative expressions delimited by + and -, but stops when it runs into a ). Hence, the recursion scheme determines operator precedence.

It isn't too terribly difficult to implement a predictive parser by hand, as I've done below (see full example at ideone.com):

function parse()
{
    global $tokens;
    reset($tokens);
    $ret = parseExpression();
    if (current($tokens) !== FALSE)
        die("Stray token at end of expression\n");
    return $ret;
}

function popToken()
{
    global $tokens;
    $ret = current($tokens);
    if ($ret !== FALSE)
        next($tokens);
    return $ret;
}

function parseExpression()
{
    return parseAdditive();
}

function parseAdditive()
{
    global $tokens;

    $expr = parseMultiplicative();

    for (;;) {
        $next = current($tokens);
        if ($next !== FALSE && $next->type == "operator" &&
            ($next->op == "+" || $next->op == "-"))
        {
            next($tokens);
            $left = $expr;
            $right = parseMultiplicative();
            $expr = mkOperatorExpr($next->op, $left, $right);
        } else {
            return $expr;
        }
    }
}

function parseMultiplicative()
{
    global $tokens;

    $expr = parsePrimary();

    for (;;) {
        $next = current($tokens);
        if ($next !== FALSE && $next->type == "operator" &&
            $next->op == "*")
        {
            next($tokens);
            $left = $expr;
            $right = parsePrimary();
            $expr = mkOperatorExpr($next->op, $left, $right);
        } else {
            return $expr;
        }
    }
}

function parsePrimary()
{
    $tok = popToken();
    if ($tok === FALSE)
        die("Unexpected end of token list\n");
    if ($tok->type == "variable")
        return mkVariableExpr($tok->name);
    if ($tok->type == "number")
        return mkNumberExpr($tok->value);
    if ($tok->type == "operator" && $tok->op == "(") {
        $ret = parseExpression();
        $tok = popToken();
        if ($tok->type == "operator" && $tok->op == ")")
            return $ret;
        else
            die("Missing end parenthesis\n");
    }

    die("Unexpected $tok->type token\n");
}

Okay, so now you have this lovely parse tree, and even a pretty picture to go with it. Now what? Your goal (for now) might be to simply combine terms to get a result of the form:

n1*a + n2*b + n3*c + n4*d + ...

I'll leave that part to you. Having a parse tree should make things much more straightforward.

Community
  • 1
  • 1
Joey Adams
  • 41,996
  • 18
  • 86
  • 115
3

PHP is good at strings, numbers, and arrays. But it is a poor language for implementing symbolic formula manipulation, because it has no native machinery for processing "symbolic expressions", for which you really want trees. Yes, you can implement all that machinery. What is harder is to do the algebraic manipulations. Its quite a lot of work if you want do build something semi-sophisticated. Ideally you want machinery to help you write the transformations directly and easily.

For instance, how will you implement arbitrary algebra rules? Associativity and commutativity? Term "matching at a distance"?, e.g.

  (3*a+b)-2(a-b)+a ==> 3a-b

You can look at how a simple CAS can be implemented using our DMS program transformation system. DMS has hard mathematical constructs like commutativity and associativity built in, and you can write algebra rules explicitly to operate on symbolic formulas.

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
2

The book Computer Algebra and Symbolic Computation: Mathematical Methods by Joel S. Cohen describes an algorithm for automatic simplification of algebraic expressions.

This algorithm is used in the Symbolism computer algebra library for C#. Going with your example, the following C# program:

var x = new Symbol("x");

(1 + 2 * x - 3 * (4 - 5 * (3 * x)))
    .AlgebraicExpand()
    .Disp();

displays the following at the console:

-11 + 47 * x
dharmatech
  • 8,979
  • 8
  • 42
  • 88
  • 1
    In fact, every graduate student in CS ends up implementing one of these in LISP as a finger-exercise. The implementation concepts go way back to the 60s when the first of these systems was implemented in LISP (MacSyma?). The key idea is "represent the expression as a tree" (essentially trivial in LISP), and write procedures to realize the algebra rules". More sophisticated schemes include pattern matchers/rewrite rules instead of hand-written procedures. (This book will surely contain discussions of these). Mathematica is a classic CAS based on this. See my answer for another. – Ira Baxter Sep 24 '14 at 10:51
  • @IraBaxter [mpl](https://github.com/dharmatech/mpl) is a Scheme library which implements many of the algorithms in the Cohen texts. – dharmatech Sep 24 '14 at 17:14