2

I have this trouble: I must verify the correctness of many mathematical expressions especially check for consecutive operators + - * /. For example:

6+(69-9)+3

is ok while

6++8-(52--*3)

no. I am not using the library <regex> since it is only compatible with C++11. Is there a alternative method to solve this problem? Thanks.

Local Hero
  • 493
  • 2
  • 7
  • 20
  • What about subtracting a negative number? `5 - -2`. That is a perfectly valid operation. – PaulMcKenzie Jan 19 '16 at 22:06
  • No, my trouble is that the expressions are in a text file and I must verify the correctness of some expressions kind 5-----6+3*99 – Local Hero Jan 19 '16 at 22:12
  • 3
    You can't solve this with regular expressions. Write a parser. – user207421 Jan 19 '16 at 22:17
  • 2
    @LocalHero You need to formally get the rules by whoever gave you this assignment, as to what is a valid mathematical expression. I gave you the case of two consecutive minus symbols, only separated by white space, as being valid, which (if you ignore the white space) goes against your request for checking for consecutive operators. – PaulMcKenzie Jan 19 '16 at 22:25
  • @LocalHero What about mismatch parentheses? Having said that, I'm sure you can just use one of the many, already written parsers that work with mathematical expressions using the basic operators and parentheses. For example: http://muparser.beltoforion.de/mup_example.html#idExample If you really want to do this yourself properly, as others mentioned, you need a parser, built from a grammar describing a valid mathematical expression. – PaulMcKenzie Jan 19 '16 at 22:44
  • Without regex the only way I see for you to verify input content item by item is in the style of a lexical analyzer, and that's a very painful thing. Literally you will have to step between values and expressions, checking each as you go. Perhaps there is a regex library compatible with whichever dated C++ edition you are programming in that can be marshaled for the task. – Hardryv Jan 19 '16 at 23:09
  • This is a classic example of a recursive descent parser. – erip Jan 19 '16 at 23:10
  • Is it possible use this: "espressione.erase(unique(espressione.begin(), espressione.end()), espressione.end());" only for some characters (+ - * /)? – Local Hero Jan 19 '16 at 23:14
  • You have some good answers, the common element to all of them being "you can't do it using regex". Since you flagged it as C++, Stroustrup implemented exactly what you are seeking over an entire chapter in his 'The C++ Programming Language' (the "special" 3rd edition) -- this isn't based on C++11 (which appears to be a good thing for you?). I do not know if the latest edition has that or not. – Happy Green Kid Naps Jan 19 '16 at 23:34

5 Answers5

5

You can use a regular expression to verify everything about a mathematical expression except the check that parentheses are balanced. That is, the regular expression will only ensure that open and close parentheses appear at the point in the expression they should appear, but not their correct relationship with other parentheses.

So you could check both that the expression matches a regex and that the parentheses are balanced. Checking for balanced parentheses is really simple if there is only one type of parenthesis:

bool check_balanced(const char* expr, char open, char close) {
  int parens = 0;
  for (const char* p = expr; *p; ++p) {
    if (*p == open) ++parens;
    else if (*p == close && parens-- == 0) return false;
  }
  return parens == 0;
}

To get the regular expression, note that mathematical expressions without function calls can be summarized as:

BEFORE* VALUE AFTER* (BETWEEN BEFORE* VALUE AFTER*)*

where:

  • BEFORE is sub-regex which matches an open parenthesis or a prefix unary operator (if you have prefix unary operators; the question is not clear).

  • AFTER is a sub-regex which matches a close parenthesis or, in the case that you have them, a postfix unary operator.

  • BETWEEN is a sub-regex which matches a binary operator.

  • VALUE is a sub-regex which matches a value.

For example, for ordinary four-operator arithmetic on integers you would have:

  • BEFORE: [-+(]

  • AFTER: [)]

  • BETWEEN: [-+*/]

  • VALUE: [[:digit:]]+

and putting all that together you might end up with the regex:

^[-+(]*[[:digit:]]+[)]*([-+*/][-+(]*[[:digit:]]+[)]*)*$

If you have a Posix C library, you will have the <regex.h> header, which gives you regcomp and regexec. There's sample code at the bottom of the referenced page in the Posix standard, so I won't bother repeating it here. Make sure you supply REG_EXTENDED in the last argument to regcomp; REG_EXTENDED|REG_NOSUB, as in the example code, is probably even better since you don't need captures and not asking for them will speed things up.

rici
  • 234,347
  • 28
  • 237
  • 341
4

You can loop over each charin your expression.

If you encounter a + you can check whether it is follow by another +, /, *...

Additionally you can group operators together to prevent code duplication.

int i = 0
while(!EOF) {
    switch(expression[i]) {
         case '+':
         case '*': //Do your syntax checks here
    }
    i++;
}
4

Well, in general case, you can't solve this with regex. Arithmethic expressions "language" can't be described with regular grammar. It's context-free grammar. So if what you want is to check correctness of an arbitrary mathemathical expression then you'll have to write a parser.

However, if you only need to make sure that your string doesn't have consecutive +-*/ operators then regex is enough. You can write something like this [-+*/]{2,}. It will match substrings with 2 or more consecutive symbols from +-*/ set.

Or something like this ([-+*/]\s*){2,} if you also want to handle situations with spaces like 5+ - * 123

Dim_ov
  • 1,421
  • 9
  • 14
  • 1
    You don't need to backslash-escape regex operators inside a character class. If you're using a posix regex library, that will add the backslash to the character class, so it's not only unnecessary but incorrect. However, you must put the `-` either at the beginning or the end. So `[-+*/]` will work fine. – rici Jan 19 '16 at 22:51
  • Note that a series of + and - are valid operators, as they're both unary operators too. + maintains sign, and - flips the sign. Unary + is rare, but valid. – Donnie Jan 19 '16 at 23:07
  • @Donnie, according to the second example in author's question (`6++8` part) series of `+` and `-` are not ok. And anyway, if we want to have a proper validator, then we need a parser. All these regex-only solutions are just dirty hacks for some limited cases. – Dim_ov Jan 19 '16 at 23:12
  • @dim_ov, We don't know if the 6++8 was invalid (it shouldn't be), or if the 2nd half of the equation (52--*3) is what is making the whole thing invalid (that is invalid). Either way, you're totally right about the regex hacks. – Donnie Jan 20 '16 at 00:00
3

Well, you will have to define some rules if possible. It's not possible to completely parse mathamatical language with Regex, but given some lenience it may work.

The problem is that often the way we write math can be interpreted as an error, but it's really not. For instance:

 5--3 can be 5-(-3)

So in this case, you have two choices:

  • Ensure that the input is parenthesized well enough that no two operators meet
  • If you find something like --, treat it as a special case and investigate it further

If the formulas are in fact in your favor (have well defined parenthesis), then you can just check for repeats. For instance:

--
+-
+*
-+

etc.

If you have a match, it means you have a poorly formatted equation and you can throw it out (or whatever you want to do).

You can check for this, using the following regex. You can add more constraints to the [..][..]. I'm giving you the basics here:

[+\-\*\\/][+\-\*\\/]

which will work for the following examples (and more):

6++8-(52--*3)
6+\8-(52--*3)
6+/8-(52--*3)

An alternative, probably a better one, is just write a parser. it will step by step process the equation to check it's validity. A parser will, if well written, 100% accurate. A Regex approach leaves you to a lot of constraints.

Serguei Fedorov
  • 7,763
  • 9
  • 63
  • 94
2

There is no real way to do this with a regex because mathematical expressions inherently aren't regular. Heck, even balancing parens isn't regular. Typically this will be done with a parser.

A basic approach to writing a recursive-descent parser (IMO the most basic parser to write) is:

  1. Write a grammar for a mathematical expression. (These can be found online)
  2. Tokenize the input into lexemes. (This will be done with a regex, typically).
  3. Match the expressions based on the next lexeme you see.
  4. Recurse based on your grammar

A quick Google search can provide many example recursive-descent parsers written in C++.

Community
  • 1
  • 1
erip
  • 16,374
  • 11
  • 66
  • 121