-2

I have a calculator and I want it to not start calculation until the expression seems to be correct. Seems to be correct means that it should not contain any non math symbols like $, # etc. I don't care about logical validity as paretheses balancing or missing operands, just invalid characters. I use server-client approach. To accomplish this I want to use regex (it could be provided with list of available operations).

For example:

  • 3 + 10 - correct
  • tan(45 * PI / 180) - correct
  • 5 % 10 - correct
  • 3 + # - incorrect
  • 3 + correct
  • 5 + 3 * ( 2 - also correct, symbols are perfectly valid

I tried to use regex that uses available operations' symbols, but here some complications I encountered:

  1. Operation's symbol's length can vary. It could be either one symbol or a function name, therefore it's needed somehow to split apart those two cases in order to make regex work corerct. I was using groups: [\+\-tan] will not work as intented, because it will match any letter from tan, but I need to only match the whole tan part.
  2. As for me, depending on available operations seems to be not such a good idea, I need more general way to test expression in case I want to use it elsewhere.
  3. The main problem with my regex was that should it encounter only one character from regex and then it will tell that it's correct despite of possible invalid following characters.
Ryszard Czech
  • 18,032
  • 4
  • 24
  • 37
medreres
  • 19
  • 6
  • 1
    A correct expression needs to follow more rules than you list. For instance, the parentheses should be balanced. You need to *parse* the expression, and regular expression is not the right tool to do it all, as in JavaScript there is no support for recursion in regexes. – trincot Mar 29 '23 at 08:48
  • Parse it and evaluate it. If an errors occur, it would be invalid, no? – InSync Mar 29 '23 at 08:49
  • See for instance [this answer on a calculator question](https://stackoverflow.com/questions/6479236/calculate-string-value-in-javascript-not-using-eval/47761792#47761792): that code either evaluates the expression or throws an error, and it explains how you can add more operators and functions. – trincot Mar 29 '23 at 08:50
  • @trincot, the problem is I have server and client, and I don't want to post unnecessary request if expression is invalid. As for paretheses balancing, its more of a logical validity, but I need lexical validity – medreres Mar 29 '23 at 09:29
  • You mean you only want to check individual characters, not how they combine together? – trincot Mar 29 '23 at 09:30
  • @trincot exactly – medreres Mar 29 '23 at 09:30
  • So then `t+n3a)(8` would be accepted? – trincot Mar 29 '23 at 09:31
  • @trincot could be, but I have no idea how to exclude charcters like # – medreres Mar 29 '23 at 09:32
  • Just list the characters you *do* want to accept and put them in a character class. That's all then. – trincot Mar 29 '23 at 09:33
  • But this contradicts what you wrote in the question: *"it will match any letter from `tan`, but I need to only match the whole `tan` part"*. That means you **do** want to validate how characters are combined to make valid expressions. And that brings us back to syntax validation. – trincot Mar 29 '23 at 09:34
  • @trincot I said it **could** be the case – medreres Mar 29 '23 at 09:39
  • You wrote *"will not work as intented"*: So now I am lost what you really intend. – trincot Mar 29 '23 at 09:44
  • @trincot Sorry for misleading. My main intetion was to check if all of the symbols in the expression are legit. I could do with just checking if there are invalid symbols and not care about correct place of thereof and I have no clue how to do that. I can't just take all the invalid symbols and check if they are present in the expression – medreres Mar 29 '23 at 09:56

2 Answers2

1

As far as I understood you're only interested in checking that individual characters are in a closed set of valid characters.

From the examples you've given that set of characters consists of:

  • digits, and point (for decimal separator)
  • operators: +-*/%
  • letters from the English alphabet so to be used in tan, PI, ...etc (in regex \w, also covering digits and underscore)
  • Parentheses
  • white space (in regex \s)

Combining that we get this character class in regex syntax:

[\w\s.()+*\/%-]

NB: put the - as last so it doesn't get interpreted as a range separator. And escape the / with a backslash so in JavaScript it isn't interpreted as the end of a regex literal.

So to validate, you could check whether there is any character in the input that is not in this class (using [^). If so, reject the input.

const isInvalid = s => /[^\w\s.()+*\/%-]/.test(s);

const tests = [
    "3 + 10",
    "tan(45 * PI / 180)",
    "5 % 10",
    "3 + #",
    "3 +",
    "5 + 3 * ( 2 -",
];

for (const test of tests) {
    console.log(test, isInvalid(test) ? "invalid" : "correct");
}

Parsing

It is clear that the above does not prevent the user from entering invalid expressions. Expressions have (recursive) grammar rules that cannot be checked by regex (alone).

Yet, you could do that with a parser. Just for reference, here is one I wrote in answer to another question: it allows you to define the list of operators and functions you want to support, and it either returns the calculated value or throws an error when the syntax is not correct.

trincot
  • 317,000
  • 35
  • 244
  • 286
0

don't do this, regex is not a Lexer, you should use one of those instead see this for more info : https://stackoverflow.com/a/1732454/21517472

willy

  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Mar 30 '23 at 05:52