21

This isn't a school assignment or anything, but I realize it's a mostly academic question. But, what I've been struggling to do is parse 'math' text and come up with an answer.

For Example - I can figure out how to parse '5 + 5' or '3 * 5' - but I fail when I try to correctly chain operations together.

(5 + 5) * 3

It's mostly just bugging me that I can't figure it out. If anyone can point me in a direction, I'd really appreciate it.

EDIT Thanks for all of the quick responses. I'm sorry I didn't do a better job of explaining.

First - I'm not using regular expressions. I also know there are already libraries available that will take, as a string, a mathematical expression and return the correct value. So, I'm mostly looking at this because, sadly, I don't "get it".

Second - What I've tried doing (is probably misguided) but I was counting '(' and ')' and evaluating the deepest items first. In simple examples, this worked; but my code is not pretty and more complicated stuff crashes. When I 'calculated' the lowest level, I was modifying the string.

So... (5 + 5) * 3

Would turn into 10 * 3

Which would then evaluate to 30

But it just felt 'wrong'.

I hope that helps clarify things. I'll certainly check out the links provided.

Kena
  • 6,891
  • 5
  • 35
  • 46
Rob P.
  • 14,921
  • 14
  • 73
  • 109
  • 2
    Are you using regular expressions? How are you currently parsing the text? – Ryan Hayes Jun 03 '10 at 20:36
  • 2
    The Dragon Book. (http://en.wikipedia.org/wiki/Dragon_Book_(computer_science) is an excellent introduction to these types of things. –  Jun 03 '10 at 20:38
  • More specifically, what have you tried and why doesn't it work? SO works best on specifics, and you're likely to get a better answer when you post more details. – David Thornley Jun 03 '10 at 20:39
  • Ronald Mak's book - Writing Compilers and Interpreters. http://www.amazon.com/Writing-Compilers-Interpreters-Ronald-Mak/dp/0471113530 is also a great resource. –  Jun 03 '10 at 20:40
  • Possibly helpful thread... http://stackoverflow.com/questions/34081/parsing-where-can-i-learn-about-it –  Jun 03 '10 at 20:43
  • This is by no means an 'academic' question - parsing expressions is a core problem in Computer Science and Software Engineering. – Escualo Jun 04 '10 at 17:07
  • @Arrieta - I think Rob P. means that he understands it is a "solved" problem, just that he was stuck understanding it. Hopefully he has found his answer an can now mark this question as "answered". – Andre Artus Jun 09 '10 at 10:22
  • @Rob, for a simple language like this you can actually use a single regular expression for your Lexer/Scanner/Tokenizer: e.g. "(-?[0-9]+|[*+-\/\(\)])". What the regex will not do is bracket/parenthesis matching. For that you should use one of the other techniques (Shunting Yard/ recursive descent) – Andre Artus Jun 09 '10 at 10:32
  • Here is a good math-parser: https://github.com/scicave/math-parser – Mohammed Samir Jan 01 '21 at 20:54
  • Here is a good parser for math expressions written in latex: https://github.com/scicave/math-latex-parser – Mohammed Samir Jan 01 '21 at 20:55

13 Answers13

16

Ages ago when working on a simple graphing app, I used this algorithm (which is reasonably easy to understand and works great for simple math expressions like these) to first turn the expression into RPN and then calculated the result. RPN was nice and fast to execute for different variable values.

Of course, language parsing is a very wide topic and there are many other ways of going about it (and pre-made tools for it too)

Matti Virkkunen
  • 63,558
  • 9
  • 127
  • 159
  • This is a great idea for a small calculator parser and it can be implemented fast. but it can also get quite hairy fast if you want to do slightly more complex stuff like function calls (`sin`,`cos`) – shoosh Jun 03 '10 at 20:41
  • 2
    @shoosh: Actually, function calls could quite easily be implemented as unary operators (although the Wikipedia page seems to ignore them, the algorithm can be extended to take them into account). For multiple parameters, you could introduce a binary comma operator that packs values together. – Matti Virkkunen Jun 03 '10 at 20:50
  • ~ What math calls would have multiple parameters inside a paren besides maybe higher order math? If you're doing a parser on stuff with multiple dimensions, something tells me you've already done a simpler parser... also, (in simple terms) `sin( VALUE )` & `VALUE ::= [paren-open] term [operator term] [paren-close]` no? – jcolebrand Jun 03 '10 at 20:58
  • 1
    you're right ofcourse. The shit really hits the fan though when you want the same function name to be overloaded to take either one or two parameters. – shoosh Jun 03 '10 at 23:06
  • 2
    @shoosh: Nah, if you consider `f(x, y)` to be f applied to a single argument, that is `(x, y)`, then you'll be fine. You could see this argument as a tuple, created by the `,` operator (as Matti alluded to). – Joren Jun 05 '10 at 22:01
9

@Rising Star [I hoped to add this as a comment, but the formatting failed]

It may seem counterintuitive, but a binary tree is both simpler and more flexible. A node, in this case, would be either a constant (number) or an operator. A binary tree makes life somewhat easier when you decide to extend the language with elements like control flow, and functions.

Example:

((3 + 4 - 1) * 5 + 6 * -7) / 2

                  '/'
                /     \
              +        2
           /     \
         *         *
       /   \     /   \
      -     5   6     -7
    /   \
   +     1
 /   \
3     4

In the case above the scanner has been programmed to read '-' followed by a series of digits as a single number, so "-7" gets returned as the value component of the "number" token. '-' followed by whitespace is retured as a "minus" token. This makes the parser somewhat easier to write. It fails on the case where you want "-(x * y)", but you can easily change the expression to "0 - exp"

Andre Artus
  • 1,850
  • 15
  • 21
  • A simple application of the composite pattern will see inner (composite) nodes as "BinaryOperator" and leaf nodes as "Constant". This is of course no longer strictly speaking a binary tree. – Andre Artus Jun 05 '10 at 22:01
  • I have to correct myself: "-" immediatly followed by "[0-9]" becomes a number, any other instance of "-" is returned as the MINUS token. A simple RegEx that returns a stream of "tokens" would be: "(-?[0-9]+|[*+-/()]|[a-z][a-z0-9]+|<=|>=|<|>|=)". this handles identifiers and relational operators. It treats unspecified lexemes as whitespace. – Andre Artus Jun 14 '10 at 19:03
7

Here is a simple (naive operator precedence) grammar for what you want.

expression = 
    term
    | expression "+" term
    | expression "-" term .
term = 
    factor
    | term "*" factor
    | term "/" factor .
factor = 
    number
    | "(" expression ")" .

When you process "factor" you just check whether the next token is a number or "(", if it's a "(" then you parse "expression" again, when expression returns you check if the next token is ")". You could have the [calculated|read] values bubble up to the parent through the use of out or ref parameters, or build an expression tree.

Here is the same thing in EBNF:

expression = 
    term
    { "+" term | "-" term  } .

term = 
    factor
    { "*" factor | "/" factor }.

factor = 
    number
    | "(" expression ")" .
Andre Artus
  • 1,850
  • 15
  • 21
4

Take your pick, Code Golf: Mathematical expression evaluator (that respects PEMDAS)

Community
  • 1
  • 1
Chris Wagner
  • 20,773
  • 8
  • 74
  • 95
4

For anyone seeing this question nine years into the future from when this post was made: If you don't want to re-invent the wheel, there are many exotic math parsers out there.

There is one that I wrote years ago in Java, which supports arithmetic operations, equation solving, differential calculus, integral calculus, basic statistics, function/formula definition, graphing, etc.

Its called ParserNG and its free.

Evaluating an expression is as simple as:

    MathExpression expr = new MathExpression("(34+32)-44/(8+9(3+2))-22"); 
    System.out.println("result: " + expr.solve());

    result: 43.16981132075472

Or using variables and calculating simple expressions:

 MathExpression expr = new MathExpression("r=3;P=2*pi*r;"); 
System.out.println("result: " + expr.getValue("P"));

Or using functions:

MathExpression expr = new MathExpression("f(x)=39*sin(x^2)+x^3*cos(x);f(3)"); 
System.out.println("result: " + expr.solve());

result: -10.65717648378352

Or to evaluate the derivative at a given point(NOTE: It does symbolic differentiation(not numerical) behind the scenes, so the accuracy is not limited by the errors of numerical approximations):

MathExpression expr = new MathExpression("f(x)=x^3*ln(x); diff(f,3,1)"); 
System.out.println("result: " + expr.solve());

 result: 38.66253179403897

Which differentiates x^3 * ln(x) once at x=3. The number of times you can differentiate is 1 for now.

or for Numerical Integration:

MathExpression expr = new MathExpression("f(x)=2*x; intg(f,1,3)"); 
System.out.println("result: " + expr.solve());

result: 7.999999999998261... approx: 8

This parser is decently fast and has lots of other functionality.

DISCLAIMER: ParserNG is authored by me.

gbenroscience
  • 994
  • 2
  • 10
  • 33
2

Did you ever take a class on formal languages in school? Effectively you need a grammar to parse by.

EDIT: Oh crap, Wikipedia says I'm wrong, but now I forget the correct name :( http://en.wikipedia.org/wiki/Formal_grammar

jcolebrand
  • 15,889
  • 12
  • 75
  • 121
  • 1
    Infix notation needs a grammar. Postfix (RPN) can be parsed with a push-down automata which is much easier to implement than a grammar. – andand Jun 03 '10 at 21:03
  • 1
    Be careful not no equate (formal) grammar with context-free grammar. A regular language is also generated by a grammar, a regular one. – anno Jun 03 '10 at 23:09
  • @anno ~ aren't the basic math ops pretty well able to be described by a formal grammar? But yes, it was context-free that I was thinking initially, along with automata. – jcolebrand Jun 03 '10 at 23:13
2

Last year-ish I wrote a basic math evaluator for reasons I can't remember. It is not in any way a "proper" parser by any stretch of the term, and .. like all old code, I'm not that proud of it now.

But you can take a look and see if it helps you.

You run some input tests by launching this standalone Java app

Matt
  • 43,482
  • 6
  • 101
  • 102
2

When I wanted to parse something I decided to use the GOLD Parser:

  • Self-contained documentation (don't need a book to understand it)
  • Various run-time engines, in various programming languages including the one I wanted.

The parser includes sample grammars, including e.g. one for operator prcedence.


Apart from GOLD are also other more famous parsers, e.g. ANTLR, which I haven't used.

ChrisW
  • 54,973
  • 13
  • 116
  • 224
2

As many answers have already stated, the issue is that you need a recursive parser with associativity rules because you can end up with expressions like:

val = (2-(2+4+(3-2)))/(2+1)*(2-1)

and your parser needs to know that:

  1. The parenthetic expressions are evaluated from the inside out
  2. The division takes precedence over multiplication (you first divide, then multiply the result)
  3. The multiplication takes precedence over addition/subtraction

As you can imagine, writing a (good) parser is an art. The good thing is that there are several tools, called parser generators which allow you to easily define the grammar of your language, and the parsing rules. You may want to check the entries in Wikipedia for BNF, so that you can see how a grammar is defined.

Finally, if you are doing this for learning experience, go ahead. If this is for production code, do not reinvent the wheel, and find an existing library, otherwise you risk spending 1000 lines of code to add 2+2.

Escualo
  • 40,844
  • 23
  • 87
  • 135
  • I agree with points 1 & 2, but 3 seems unnecessary because division is the same as multiplying with the reciprocal (multiplicative inverse) [i.e. a / b = a * 1/b], and multiplication is associative [(a * b) * c = a * (b * c)] and commutative [a * b = b * a]. These properties, alongside the distributive property [a * (b + c) = (a * b) + (a * c)], are often used to optimize an expression. – Andre Artus Jun 07 '10 at 01:57
  • It seems I made a mistake in my comment: I agree that 1.The parenthetic expressions are evaluated from the inside out 3.The multiplication takes precedence over addition/subtraction It is with the 2nd item that I disagree (division takes precedence over multiplication) – Andre Artus Mar 22 '11 at 21:24
  • 1
    @Andre: but if division does not take precendence over multiplication, how would you do this: 6/2*4? How can you multiply by four something that you don't have the value for? Isn't it true that you first get the value 6/2 = 3, and then 3*4 = 12? As you mention, that "division is the same as multiplying by the inverse", in order to have the inverse you MUST have divided (otherwise, where is the inverse?) – Escualo Mar 25 '11 at 19:31
  • 1
    I think the term "precedence" is not correct, though, as they have the same "operator precedence", maybe it is "associativity"? – Escualo Mar 25 '11 at 19:34
  • 1
    @Arrieta: You are right, this is normally resolved by associativity: the PLUS, MINUS, MUL, and DIV operators are normally left associative. Thus 6/2*4 is essencially parsed as if it was (6/2)*4; 4*6/2 is parsed as (4*6)/2. If you define "^" as a exponent operator then use right association (recursion) e.g. 2^3^4 --> 2^(3^4). – Andre Artus Apr 05 '11 at 13:38
1

Essentially, you are asking us how to write a "parser." Here is another Stack Overflow question about parsers: hand coding a parser

Community
  • 1
  • 1
Heath Hunnicutt
  • 18,667
  • 3
  • 39
  • 62
1

I did something similar to what you describe. I use recursion to parse all the parenthesis. I then use a ternary tree to represent the different segments. The left branch is the left hand side of the operator. The center branch is the operator. The right branch is the right hand side of the operator.

Short Answer Recursion and ternary trees.

Vivian River
  • 31,198
  • 62
  • 198
  • 313
1

There is always an option to use math parser library, such as mXparser. You can:

1 - Check expression syntax

import org.mariuszgromada.math.mxparser.*;
...
...
Expression e = new Expression("2+3-");
e.checkSyntax();
mXparser.consolePrintln(e.getErrorMessage());

Result:

[mXparser-v.4.0.0] [2+3-] checking ...
[2+3-] lexical error 

Encountered "<EOF>" at line 1, column 4.
Was expecting one of:
    "(" ...
    "+" ...
    "-" ...
    <UNIT> ...
    "~" ...
    "@~" ...
    <NUMBER_CONSTANT> ...
    <IDENTIFIER> ...
    <FUNCTION> ...
    "[" ...

[2+3-] errors were found.

[mXparser-v.4.0.0]

2 - Evaluate expression

import org.mariuszgromada.math.mxparser.*;
...
...
Expression e = new Expression("2+3-(10+2)");
mXparser.consolePrintln(e.getExpressionString() + " = " + e.calculate());

Result:

[mXparser-v.4.0.0] 2+3-(10+2) = -7.0

3 - Use built-in functions constants, operators, etc..

import org.mariuszgromada.math.mxparser.*;
...
...
Expression e = new Expression("sin(pi)+e");
mXparser.consolePrintln(e.getExpressionString() + " = " + e.calculate());

Result:

[mXparser-v.4.0.0] sin(pi)+e = 2.718281828459045

4 - Define your own functions, arguments and constants

import org.mariuszgromada.math.mxparser.*;
...
...
Argument z = new Argument("z = 10");
Constant a = new Constant("b = 2");
Function p = new Function("p(a,h) = a*h/2");
Expression e = new Expression("p(10, 2)-z*b/2", p, z, a);
mXparser.consolePrintln(e.getExpressionString() + " = " + e.calculate());

Result:

[mXparser-v.4.0.0] p(10, 2)-z*b/2 = 0.0

5 - Tokenize expression string and play with expression tokens

import org.mariuszgromada.math.mxparser.*;
...
...
Argument x = new Argument("x");
Argument y = new Argument("y");
Expression e = new Expression("2*sin(x)+(3/cos(y)-e^(sin(x)+y))+10", x, y);
mXparser.consolePrintTokens( e.getCopyOfInitialTokens() );

Result:

[mXparser-v.4.0.0]  --------------------
[mXparser-v.4.0.0] | Expression tokens: |
[mXparser-v.4.0.0]  ---------------------------------------------------------------------------------------------------------------
[mXparser-v.4.0.0] |    TokenIdx |       Token |        KeyW |     TokenId | TokenTypeId |  TokenLevel |  TokenValue |   LooksLike |
[mXparser-v.4.0.0]  ---------------------------------------------------------------------------------------------------------------
[mXparser-v.4.0.0] |           0 |           2 |       _num_ |           1 |           0 |           0 |         2.0 |             |
[mXparser-v.4.0.0] |           1 |           * |           * |           3 |           1 |           0 |         NaN |             |
[mXparser-v.4.0.0] |           2 |         sin |         sin |           1 |           4 |           1 |         NaN |             |
[mXparser-v.4.0.0] |           3 |           ( |           ( |           1 |          20 |           2 |         NaN |             |
[mXparser-v.4.0.0] |           4 |           x |           x |           0 |         101 |           2 |         NaN |             |
[mXparser-v.4.0.0] |           5 |           ) |           ) |           2 |          20 |           2 |         NaN |             |
[mXparser-v.4.0.0] |           6 |           + |           + |           1 |           1 |           0 |         NaN |             |
[mXparser-v.4.0.0] |           7 |           ( |           ( |           1 |          20 |           1 |         NaN |             |
[mXparser-v.4.0.0] |           8 |           3 |       _num_ |           1 |           0 |           1 |         3.0 |             |
[mXparser-v.4.0.0] |           9 |           / |           / |           4 |           1 |           1 |         NaN |             |
[mXparser-v.4.0.0] |          10 |         cos |         cos |           2 |           4 |           2 |         NaN |             |
[mXparser-v.4.0.0] |          11 |           ( |           ( |           1 |          20 |           3 |         NaN |             |
[mXparser-v.4.0.0] |          12 |           y |           y |           1 |         101 |           3 |         NaN |             |
[mXparser-v.4.0.0] |          13 |           ) |           ) |           2 |          20 |           3 |         NaN |             |
[mXparser-v.4.0.0] |          14 |           - |           - |           2 |           1 |           1 |         NaN |             |
[mXparser-v.4.0.0] |          15 |           e |           e |           2 |           9 |           1 |         NaN |             |
[mXparser-v.4.0.0] |          16 |           ^ |           ^ |           5 |           1 |           1 |         NaN |             |
[mXparser-v.4.0.0] |          17 |           ( |           ( |           1 |          20 |           2 |         NaN |             |
[mXparser-v.4.0.0] |          18 |         sin |         sin |           1 |           4 |           3 |         NaN |             |
[mXparser-v.4.0.0] |          19 |           ( |           ( |           1 |          20 |           4 |         NaN |             |
[mXparser-v.4.0.0] |          20 |           x |           x |           0 |         101 |           4 |         NaN |             |
[mXparser-v.4.0.0] |          21 |           ) |           ) |           2 |          20 |           4 |         NaN |             |
[mXparser-v.4.0.0] |          22 |           + |           + |           1 |           1 |           2 |         NaN |             |
[mXparser-v.4.0.0] |          23 |           y |           y |           1 |         101 |           2 |         NaN |             |
[mXparser-v.4.0.0] |          24 |           ) |           ) |           2 |          20 |           2 |         NaN |             |
[mXparser-v.4.0.0] |          25 |           ) |           ) |           2 |          20 |           1 |         NaN |             |
[mXparser-v.4.0.0] |          26 |           + |           + |           1 |           1 |           0 |         NaN |             |
[mXparser-v.4.0.0] |          27 |          10 |       _num_ |           1 |           0 |           0 |        10.0 |             |
[mXparser-v.4.0.0]  ---------------------------------------------------------------------------------------------------------------

6 - You can find much more in mXparser tutorial, mXparser math collection and mXparser API definition.

7 - mXparser supports:

  • JAVA
  • .NET/MONO
  • .NET Core
  • .NET Standard
  • .NET PCL
  • Xamarin.Android
  • Xamarin.iOS

Additionally - this software is using mXparser as well - you can learn the syntax Scalar Calculator app.

Best regards

Leroy Kegan
  • 1,156
  • 10
  • 9
0

Taken from here [But, I've added the Division(/) functionality to it]

<html>
<body>
    <h1>how to write a parser - part2 </h1>
    Expression<input id='expression'>
    Result<input id='result'>
    <button onclick="parse()">PARSE</button>
</body>
<script>
    // split expression by operator considering parentheses
    const split = (expression, operator) => {
        const result = [];
        let braces = 0;
        let currentChunk = "";
        for (let i = 0; i < expression.length; ++i) {
            const curCh = expression[i];
            if (curCh == '(') {
                braces++;
            } else if (curCh == ')') {
                braces--;
            }
            if (braces == 0 && operator == curCh) {
                result.push(currentChunk);
                currentChunk = "";
            } else currentChunk += curCh;
        }
        if (currentChunk != "") {
            result.push(currentChunk);
        }
        return result;
    };
// Division
    const parseDivisionSeparatedExpression = (expression) => {
        const numbersString = split(expression, '/');
        const numbers = numbersString.map(noStr => {
            if (noStr[0] == '(') {
                const expr = noStr.substr(1, noStr.length - 2);
                // recursive call to the main function
                return parsePlusSeparatedExpression(expr);
            }
            return noStr;
        });
        const initialValue = 1.0;
        const result = numbers.reduce((acc, no) => {
            return acc / no
        });
        return result;
    };
    var res = (12 - 5-(5/2 + (32 + 4)) + (3*20))/2//12-5-38.5+60
    // this will only take strings containing * operator [ no + ]
    const parseMultiplicationSeparatedExpression = (expression) => {
        const numbersString = split(expression, '*');
        const numbers = numbersString.map(noStr => parseDivisionSeparatedExpression(noStr));

        const initialValue = 1.0;
        console.log("parseMultiplicationSeparatedExpression - numbers: ", numbers)
        const result = numbers.reduce((acc, no) => acc * no, initialValue);
        return result;
    };
    // both * -
    const parseMinusSeparatedExpression = (expression) => {
        const numbersString = split(expression, '-');
        const numbers = numbersString.map(noStr => parseMultiplicationSeparatedExpression(noStr));
        const initialValue = numbers[0];
        const result = numbers.slice(1).reduce((acc, no) => acc - no, initialValue);
        return result;
    };
    // * - +
    const parsePlusSeparatedExpression = (expression) => {
        const numbersString = split(expression, '+');
        const numbers = numbersString.map(noStr => parseMinusSeparatedExpression(noStr));
        const initialValue = 0.0;
        const result = numbers.reduce((acc, no) => acc + no, initialValue);
        return result;
    };
    const parse = () => {
        const expressionNode = document.getElementById('expression');
        const resultNode = document.getElementById('result');
        var expression = expressionNode.value;
        expression = expression.replace(/ +/g, '')
        console.log("parse - expression: ", expression)
        const result = parsePlusSeparatedExpression(expression, '+');
        resultNode.value = String(result);
    };
</script>

Example: 12 * 5+(5 * (32 - 4)) + 3 = 203