Convert a string with a math formula to an object tree?

Question

I am looking for a function that converts a math string passed as an argument (with the operations +, -, /, *), that returns an object that contains pieces of the math string, in an ordered/better way, which is easier to traverse.

Characteristics of the input:

is a string containing a formula
the formula doesn't have =, so it isn't an equation
there isn't any floating number, just integers
the integers can be positive or negative
no variables like x, y, z
can include parentheses

Test cases

Example 1: Basic Math (same operation)

N	Input (string)	Output (object)
1	1	`{ values: [1], operation: null }`
2	1+1	`{ values: [1,1], operation: "+" }`
3	1+2+3	`{ values: [1,2,3], operation: "+" }`
4	3-2-1	`{ values: [3,2,1], operation: "-" }`
5	10*80	`{ values: [10,80], operation: "*" }`
6	100/10	`{ values: [100,10], operation: "/" }`

Example 2: Formula with 2 operations

➡️ + and - samples

N1

input: 1+1-1

output:

{
    values: [
      {
        values: [1, 1],
        operation: "+",
      },
      1,
    ],
    operation: "-",
};

N2

input: 3+2-1+5

output:

{
    values: [
      {
        values: [
          {
            values: [3, 2],
            operation: "+",
          },
          1,
        ],
        operation: "-",
      },
      5,
    ],
    operation: "+",
};

N3

input: 3+2-1+5+10+7

output:

{
    values: [
      {
        values: [
          {
            values: [3, 2],
            operation: "+",
          },
          1,
        ],
        operation: "-",
      },
      5,
      10,
      7
    ],
    operation: "+",
};

➡️ + and / samples

N4

input: 1+2/3

output:

{
    values: [
      1,
      {
        values: [2, 3],
        operation: "/",
      },
    ],
    operation: "+",
};

N5

input: 2/3+1

output:

{
    values: [
      {
        values: [2, 3],
        operation: "/",
      },
      1,
    ],
    operation: "+",
};

N6

input: 1/2+3/4+5/6

output:

{
    values: [
      {
        values: [1, 2],
        operation: "/",
      },
      {
        values: [3, 4],
        operation: "/",
      },
      {
        values: [5, 6],
        operation: "/",
      },
    ],
    operation: "+",
};

N7

input: 1/2/3/4/5+6+7+8/9+10/11

output:

{
    values: [
      {
        values: [1, 2, 3, 4, 5],
        operation: "/",
      },
      6,
      7,
      {
        values: [8, 9],
        operation: "/",
      },
      {
        values: [10, 11],
        operation: "/",
      },
    ],
    operation: "+",
};

➡️ / and - samples

N8

input: 1-2/3

output:

{
    values: [
      1,
      {
        values: [2, 3],
        operation: "/",
      },
    ],
    operation: "-",
};

➡️ / and * samples

N9

input: 10/2*5

output:

{
    values: [
      {
        values: [10, 2],
        operation: "/",
      },
      5,
    ],
    operation: "*",
};

Example 3: Formula with 4 operations

N1

input: 10/2*5+1-1*5/3+2*4

output:

{
    values: [
      {
        values: [
          {
            values: [
              {
                values: [
                  {
                    values: [10, 2],
                    operation: "/",
                  },
                  5,
                ],
                operation: "*",
              },
              1,
            ],
            operation: "+",
          },
          {
            values: [
              {
                values: [1, 5],
                operation: "*",
              },
              3,
            ],
            operation: "/",
          },
        ],
        operation: "-",
      },
      {
        values: [2, 4],
        operation: "*",
      },
    ],
    operation: "+",
};

Example 4: Formula with () parenthesis

N1

input: 1+2*(3+2)

output:

{
    values: [
      1,
      {
        values: [
          2,
          {
            values: [3, 2],
            operation: "+",
          },
        ],
        operation: "*",
      },
    ],
    operation: "+",
};

N2

input: (1+2*3)*2

output:

{
    values: [
      {
        values: [
          1,
          {
            values: [2, 3],
            operation: "*",
          },
        ],
        operation: "+",
      },
      2,
    ],
    operation: "*",
};

N3

input: (1/1/10)+1/30+1/50

output:

{
    values: [
      {
        values: [1, 1, 10],
        operation: "/",
      },
      {
        values: [1, 30],
        operation: "/",
      },
      {
        values: [1, 50],
        operation: "/",
      },
    ],
    operation: "+",
  };

Other Cases

N1

input: -(1+2)

output:

{
    values: [
      {
        values: [1, 2],
        operation: "+",
      },
    ],
    operation: "-",
};

...

The grammar rules (order of operations) of your math string aren’t very clear. Consider `(1 / 1/10) + 1/30 + 1/50` how would that be written? — James, Dec 24 '22 at 17:30
@James yes, that what make the code very hard to me, I tried also recursion and so on but is very hard for me. thanks the same. — stackdeveloper, Dec 24 '22 at 18:14
I prefer code that can work with string without `()` and if there is just ignore them `.replace("(", "");` and `.replace(")", "");`, spaces need to be deleted by javascript by just a simple `.replace(" ", "");` — stackdeveloper, Dec 24 '22 at 18:17
@James if we sanitize the string from a set of characters like `space, (, )` maybe there is some regex or something that can work here... I don't know if makes sense, and sorry a lot for this question since I get downvoted — stackdeveloper, Dec 26 '22 at 16:42
The graph you posted in the question describes the expression `1 / (1/10 + 1/30 + 1/50)`, not `1 / 1/10 + 1/30 + 1/50`. — axiac, Dec 26 '22 at 17:14
@axiac yes that what I wanted like this 1 / (1/10 + 1/30 + 1/50), but without I add the () the script should know how to calculate that, sorry if I am wrong somewhere, maybe you are alright I dont know sorry — stackdeveloper, Dec 26 '22 at 17:16
It sounds like you are looking for an Abstract Syntax Tree, check that out, theres lots of information on the subject. — element11, Dec 26 '22 at 17:17
@element11 yes this is what I want, do you know a article or a tutorial that teach that? can you link to me :) thanks a lot — stackdeveloper, Dec 26 '22 at 17:19
@element11 I included now in the title and tag "Abstract Syntax Tree" so I can get better change to find a answer, thanks a lot my friend! — stackdeveloper, Dec 26 '22 at 17:22
Related : https://stackoverflow.com/questions/2705727/generate-syntax-tree-for-simple-math-operations — SioGabx, Dec 26 '22 at 17:48
In this case the parenthesis change the expression. Without them, the expression is equivalent to `(1/1/10) + 1/30 + 1/50`. Don't forget about the order of operations. — axiac, Dec 26 '22 at 18:15
Your example input and expected output DO NOT follow [normal math operator precedence rules](https://kevinsguides.com/guides/math/algebra/orderprecedence), nor any logic that I can figure out. If you want your question to get an answer, please make your question make sense. If not for the bounty, it would have been closed by now. — Inigo, Dec 27 '22 at 01:43
@SioGabx since we can't close this question as a duplicate, why don't you post the link to [the other question you found](https://stackoverflow.com/questions/2705727/generate-syntax-tree-for-simple-math-operations) and its accepted answer. A link rather than copying the code. I'll vote it up. — Inigo, Dec 27 '22 at 01:50
Parsing formulas is simple as long as you have a clear grammar. The only thing that can make it difficult are unary + and -, to make it simple you could say that - with no space afterwards is always unary (otherwise binary) and you don't have unary +. — maraca, Dec 28 '22 at 16:20
I don't understand. The asker takes the effort to place a bounty, but does not take the effort to edit their question so that at least it is consistent. It currently is contradicting itself as has been explained in above comments. Weird. — trincot, Dec 29 '22 at 17:26
Once you have dealt with the above comments, could you also clarify the following: are the constants always unsigned integers, or signed, or maybe decimal, maybe scientific notation, maybe hexadecimal/binary notations,...etc? Will you have a *unary* minus operator like in `-(1 + 2)`? Will there be *variable names*? — trincot, Dec 30 '22 at 20:58
@trincot yeah I will do it for sure, now I writed 7 test cases... I will try to add more test cases and also that. I completelly changed the logic to work like comment say. I will try to do my best. — stackdeveloper, Dec 30 '22 at 21:44
That is an improvement! Note that test case N4 and N5 are the same, and the last test case has a non-consecutive number (make it easy and remove the numbering). The logic is consistent, except for the very first case where the input is `1` (without operator). In that case it would be more consistent to just return `1`, just like the cases where there is one operator, and the two operands appear in a nested array. — trincot, Dec 30 '22 at 22:03
@trincot thanks, I will go and add more like how it will be with parenthesis and so on... thanks for spotting thr n4 n5 repetition, maybe I ctrl v wrong :) — stackdeveloper, Dec 30 '22 at 22:05
Concerning the most recent test case you added: `10/2*5+1-1*5/3+2*4`. The minus in the middle is a *binary* operator, so I would not expect it to end up as if there is an operand `-1`. That would make `1` and `-1` to be operands without any operator between them. That is not consistent. Are you sure about this? — trincot, Dec 30 '22 at 22:06
@trincot in my mind is like `[(1/2)*5]+1+[(-1*5)/3]+(2*4)`... is this wrong? the result should be the same I think. basically I tried to translate it in my mind like there is parenthesis and do the logic based on that — stackdeveloper, Dec 30 '22 at 22:10
@trincot edited, it just a typo `[(1/2)*5]+1+[(-1*5)/3]+(2*4)` — stackdeveloper, Dec 30 '22 at 22:15
But now you have more `+` than in the input. The `+` after the first `1` is not present in the input. In the input, the `-` is the binary operator between the two `1` operands, but now you introduce a `+` that was not there and convert the `-` to become part of the number literal. I don't find that very consistent. — trincot, Dec 30 '22 at 22:18
@trincot yeah this is what I thinked, but turn out that is the more easiest for me, so I don't need to create a extra {} object nested. the important that the result is the same or something like that... sorry if I use a lot of your time :( thanks for helping me — stackdeveloper, Dec 30 '22 at 22:21
Yes, but if we have to automate this, we should know what you expect as output, and as you have it now I don't see the logic when to invent an extra `+` and when not. — trincot, Dec 30 '22 at 22:22
@trincot for you how I can do the minus, seems very hard to me. can you suggest me a little change to this code `{ values: [ { values: [-1, 5], operation: "*", }, 3, ], operation: "/", },` I can't do it like operation, because `*` have the precedence, if is not was for the * or / it will be easy — stackdeveloper, Dec 30 '22 at 22:23
Well, if the input is `5 * -1`, then you'll have `[5, -1]` in an array, but if the input is `5 - 1`, you shouldn't change that *binary* `-` to a binary `+` and negate `1` to `-1`. Just deal with `-` as a binary operator. Otherwise it becomes unclear when to re-interpret binary minus operators as additions. — trincot, Dec 30 '22 at 22:25
@trincot sorry but I don't understand well, how is supposed to be in the object? seems hard to me, I know you are right, but can we use that logic if the result remain the same please. if there a simple alternative I will go for it — stackdeveloper, Dec 30 '22 at 22:29
So for `10/2*5+1-1*5/3+2*4`, the output could be like here: https://system.tips/text/view?q=ejvmhzpz — trincot, Dec 30 '22 at 22:31
@trincot fine is alright! I checked it very well, I will add it to the question.... thanks a lot — stackdeveloper, Dec 30 '22 at 22:35
Now, if you could also look at the clarifications about the numbers that are allowed (see earlier comment), unary `-` and maybe other kinds of literals? For instance, would this need to work? `-(5*3.1415)/1e-10+0.1*-.5+0b010101` If not, can you pinpoint what is possible and what not? — trincot, Dec 30 '22 at 22:39
@trincot for now I don't need named variables in the formula, or floating numbers, just integers that can be negative or positive. I will point also that tomorrow, now is late here... I will do it for sure promised (I am from italy and is like midnight)... I will add now 2 test cases for parenthesis and tomorrow complete. thanks a lot for your kindness really, you do a fantastic job, and thanks a lot for you guiding me — stackdeveloper, Dec 30 '22 at 22:41
OK, just need clarification for that first `-` in like `-(1+2+3)`: would that be a possibility? Or `1 / -(2+3)`... In short, is there a unary `-` operation? NB: I am in the same timezone as you :-) — trincot, Dec 30 '22 at 22:52
@trincot interesting same timezone... I will see if I can figure out a object also for that too — stackdeveloper, Dec 30 '22 at 22:56
Yes, it could be the same shape of object, but with a `values` array that just has one element. so `-(1+2)` would become `{ values: [{ values: [1,2], operation: "+"}], operation: "-"}` — trincot, Dec 30 '22 at 22:57
@trincot yes but it wasn't that if there is `-` outside... is like saying `-1*(1+2)` — stackdeveloper, Dec 30 '22 at 22:59
It is like saying that, but without multiplication. In mathematics the unary minus is a distinct operation. See [unary negative and positive](https://en.wikipedia.org/wiki/Unary_operation#Unary_negative_and_positive). — trincot, Dec 30 '22 at 23:00
@trincot yeah makes sense so we don't add a new number from nowhere. is right. I need to say good night to you, tomorrow we can do more and maybe solve and find a solution :) thanks a lot and sorry for using a lot of your time, you are really interested and helpful — stackdeveloper, Dec 30 '22 at 23:07
Is the real goal to evaluate the expression (get the numeric result)? Or do you really need the tree? — Olivier, Dec 31 '22 at 09:39
@Olivier Doesn't really make a difference for the implementation, does it? — Bergi, Dec 31 '22 at 09:49
@Bergi If the real goal is to get the numeric result, it's as simple as `eval("1+1")`. — Olivier, Dec 31 '22 at 09:58
@trincot wanna repost https://stackoverflow.com/a/47761792/1048572? — Bergi, Dec 31 '22 at 10:06
@Olivier Simply but [likely wrong](https://stackoverflow.com/q/18645153/1048572), and very insecure. [So](https://stackoverflow.com/q/5066824/1048572) [many](https://stackoverflow.com/q/11422513/1048572) [better](https://stackoverflow.com/q/6479236/1048572) [alternatives](https://stackoverflow.com/q/2276021/1048572) exist. — Bergi, Dec 31 '22 at 10:10
@Bergi I agree eval() has its shortcomings, but we don't know the OP's exact needs. Anyway I will add [PEG.js](https://pegjs.org/) to your alternatives. — Olivier, Dec 31 '22 at 10:18
@Bergi, I could repost [that](https://stackoverflow.com/a/47761792/1048572), but yesterday I posted a new answer [here](https://stackoverflow.com/a/74964290/5459839) that comes very close to what is being asked here, except for the slightly different output format, the additional rules for flattening operations by their left-associativity, and the weird output required for number-input only (first example). But little code is needed to adapt to those requirements. Not sure what is the right thing to do here. — trincot, Dec 31 '22 at 10:24
@trincot Ah, nice! I still would tend to wait until the bounty expires and then just close as a duplicate (unless the OP edits their question to include an attempt at writing a parser and need help with that). — Bergi, Dec 31 '22 at 10:28
@trincot We don't know if the OP's weird requirements are real or not. That's why I asked about the real goal. If the OP just wants to evaluate the expression, then the exact format of the AST doesn't matter. — Olivier, Dec 31 '22 at 10:37
This is simply solved with an AST. Your parser can recognize strings of digits, unary operators `+-` and binary operators `+-*/`. I don't see any benefit in considering poly-ary operators like in `1+2+3`. Alternatively, start with a lexical phase that will extract the strings of digits as integers. — , Dec 31 '22 at 14:49
thank you all! I found the solution now. you are amazing community :) — stackdeveloper, Dec 31 '22 at 17:57
@stackdeveloper If you found your own solution, [please post it as an answer](/help/self-answer) — Bergi, Dec 31 '22 at 18:04

Matt Timmermans · Accepted Answer · 2022-12-31T14:53:37.483

Here's a function that seems to pass all of your test cases.

Somehow I've written a lot of expression parsers, and I eventually settled on doing it this way in pretty much all cases. This is a recursive descent parser that is just about as simple as a fully-featured expression parser can be.

The secret is the parseTokens function's minPrec parameter, which tells it to parse until it encounters an operator with lower precedence. That makes it easy for the function to call itself recursively at each precedence level.

/**
 * Parse an expression into the required tree
 * @param str {string}
 */
function parseExpression(str) {
    // break string into tokens, in reverse order because pop() is faster than shift()
    const tokens = str.match(/[.0-9Ee]+|[^\s]/g).reverse();
    const tree = parseTokens(tokens, 0);
    if (tokens.length) {
        throw new Error(`Unexpected ${tokens.pop()} after expression`);
    }
    return tree;
}

const BINARY_PRECEDENCE = {
    '+': 0,
    '-': 0,
    '*': 1,
    '/': 1,
}

const UNARY_PRECEDENCE = {
    '+': 10,
    '-': 10,
}

/**
 * Given an array of tokens in reverse order, return binary expression tree
 * 
 * @param tokens {string[]} tokens
 * @param minPrec {number} stop at operators with precedence smaller than this
 */
function parseTokens(tokens, minPrec) {
    if (!tokens.length) {
        throw new Error('Unexpected end of expression');
    }

    // get the left operand
    let leftToken = tokens.pop();
    let leftVal;
    if (leftToken === '(') {
        leftVal = parseTokens(tokens, 0);
        if (tokens.pop() !== ')') {
            throw new Error('Unmatched (');
        }
    } else if (UNARY_PRECEDENCE[leftToken] != undefined) {
        const operand = parseTokens(tokens, UNARY_PRECEDENCE[leftToken]);
        if (typeof operand === 'number' && leftToken === '-') {
            leftVal = -operand;
        } else if (typeof operand === 'number' && leftToken === '+') {
            leftVal = operand;
        } else {
            leftVal = {
                operation: leftToken,
                values: [operand]
            }
        }
    } else {
        leftVal = Number(leftToken);
        if (isNaN(leftVal)) {
            throw new Error(`invalid number ${leftToken}`);
        }
    }

    // Parse binary operators until we hit the end or a stop
    while (tokens.length) {
        // end of expression
        const opToken = tokens.pop();
        const opPrec = BINARY_PRECEDENCE[opToken];
        if (opToken === ')' || (opPrec != undefined && opPrec < minPrec)) {
            // We have to stop here.  put the token back and return
            tokens.push(opToken);
            return leftVal;
        }
        if (opPrec == undefined) {
            throw new Error(`invalid operator ${opToken}`)
        }

        // we have a binary operator.  Get the right operand
        const operand = parseTokens(tokens, opPrec + 1);
        if (typeof leftVal === 'object' && leftVal.operation === opToken) {
            // Extend the existing operation
            leftVal.values.push(operand);
        } else {
            // Need a new operation
            leftVal = {
                values: [leftVal, operand],
                operation: opToken
            }
        }
    }
    return leftVal;
}

thanks this is fantastic job from you, I will learn a lot from your code, I also give you the bounty of 100points. have a nice day :) @matt — stackdeveloper, Dec 31 '22 at 17:40
if I can I want to give you more than 100 point, you do a fantastic job really. I started another bounty of 200, after 23hours SO site will give the permission to give you the another reward :) thanks a lot ! — stackdeveloper, Dec 31 '22 at 17:48
This also works for `---++--++-5`. I guess it should, but looks strange. — maraca, Jan 01 '23 at 03:26

trincot · Answer 2 · 2022-12-31T19:32:18.863

I will post my solution, which is based on this answer to a similar question.

That solution is based on a regular expression which tokenizes the input with named capture groups. Consequently the rest of the code does not have to deal with character comparisons and can focus on the grammar rules. The implementation shows no explicit notion of precedence rules, as they are implied by the grammar.

I repeat here the grammar rules, but see also the explanation in the above referenced answer:

<literal>    ::= [ '-' ] <digit> { <digit> } [ '.' { <digit> } ] ; no white space allowed
<operator2>  ::= '*' | '/'
<operator1>  ::= '+' | '-'
<factor>     ::= '-' <factor> | '(' <expression> ')' | <literal>
<term>       ::= [ <term> <operator2> ] <factor>
<expression> ::= [ <expression> <operator1> ] <term>

Here I extend that solution with the following specifics:

It produces the tree node format instead of the nested array
It keeps nodes flat when multiple operands can be combined with the same binary operator
It resolves double negations: For example, -(-(1+2)) will produce the same tree as 1+2.

You can try different inputs in the snippet below.

function parse(s) {
    // Create a closure for the two variables needed to iterate the input:
    const get = ((tokens, match=tokens.next().value) =>
            // get: return current token when it is of the required group, and move forward, 
            //      else if it was mandatory, throw an error, otherwise return undefined
            (group, mandatory) => {
                if (match?.groups[group] !== undefined) 
                    return [match?.groups[group], match = tokens.next().value][0];
                if (mandatory) 
                    throw `${s}\n${' '.repeat(match?.index ?? s.length)}^ Expected ${group}`;
            }
        )(  // Get iterator that matches tokens with named capture groups.
            s.matchAll(/(?<number>\d+(?:\.\d*)?)|(?<open>\()|(?<close>\))|(?<add>\+|(?<unary>-))|(?<mul>[*\/])|(?<end>$)|\S/g)
        ),
        // negationRule: Removes double negation; toggles sign when number is negated
        negationRule = (a, b=a.values?.[0]) =>
            a.values?.length + (b?.values?.length ?? 1) == 2 ? (b.values?.[0] ?? -b) : a,
        // associativityRule: Flattens operation when first operand has same binary operation
        associativityRule = (a, b=a.values?.[0]) =>
            a.operation === b?.operation && a.values?.length > 1 && b.values.length > 1
                ? {...a, values: [...b.values, ...a.values.slice(1)] } : a,
        // node: Creates a tree node from given operation, applying simplification rules
        node = (operation, ...values) => 
            negationRule(associativityRule({ values, operation })),
        // Grammar rules implementation, using names of regex capture groups, returning nodes
        factor = (op=get("unary")) => 
            op ? node(op, factor()) : get("open") ? expr("close") : +get("number", 1),
        term = (arg=factor(), op=get("mul")) => 
            op ? term(node(op, arg, factor())) : arg,
        expr = (end, arg=term(), op=get("add")) => 
            op ? expr(end, node(op, arg, term())) : (get(end, 1), arg);
    return expr("end");
}


// I/O Management

const [input, select, output] = document.querySelectorAll("input, select, pre");
(input.oninput = () => {
    if (input.value != select.value) select.value = "";
    try {
        output.textContent =  JSON.stringify(parse(input.value), null, 2)
            // put integer-arrays on one line:
            .replace(/\[[^{\]]+\]/g, m => m.replace(/\s+/g, "").replace(/,/g, ", "));
    } catch(err) {
        output.textContent = err;
    }
})();
select.onchange = () => (input.value = select.value, input.oninput());

input { width: 30em; margin-bottom: 10px; }

Type a math expression or choose one from drop down:<br>
<input value="1 / (1/10 + 1/30 + 1/50)">
<select>
    <option>
    <option>1
    <option>1+1
    <option>1+2+3
    <option>3-2-1
    <option>10*80
    <option>100/10
    <option>1+1-1
    <option>3+2-1+5
    <option>3+2-1+5+10+7
    <option>1+2/3
    <option>2/3+1
    <option>1/2+3/4+5/6
    <option>1/2/3/4/5+6+7+8/9+10/11
    <option>1-2/3
    <option>10/2*5
    <option>10/2*5+1-1*5/3+2*4
    <option>1+2*(3+2)
    <option>(1+2*3)*2
    <option>(1/1/10)+1/30+1/50
    <option>-(1+2)
    <option>1/-(-(1+2))
</select>
<pre></pre>

thanks a lot, @trincot, I will upvote this, you do an amazing job too. but I already accepted matt answer. I remember you help me too yesterday but I already upvote him. but I upvote you. hope this will be helpful to the stackoverflow community too that will see this answer in the future :) — stackdeveloper, Dec 31 '22 at 18:11

Convert a string with a math formula to an object tree?

Characteristics of the input:

Test cases

Example 1: Basic Math (same operation)

Example 2: Formula with 2 operations

➡️ + and - samples

N1

N2

N3

➡️ + and / samples

N4

N5

N6

N7

➡️ / and - samples

N8

➡️ / and * samples

N9

Example 3: Formula with 4 operations

N1

Example 4: Formula with () parenthesis

N1

N2

N3

Other Cases

N1

2 Answers2