3

I am trying to parse numerical comparisons in string form. I want to tokenize a string such as 45+(30*2)<=50 such that the resulting groups are 45, +, (30 * 2), <=, and 50.

I know I can define my groups as

  • \w* for the numerical terms
  • \(.*\) for the parenthetical terms
  • [\+\-\*\\=<>]{1,2} for the operator terms

but I don't know how to say "A numerical or parenthetical term followed by an operational term, and that whole thing repeated any number of times, ending in a numerical or parenthetical term".

Is such a thing possible with regex?

Dan Brenner
  • 880
  • 10
  • 23
  • 1
    Not entirely on topic but you should use `\d` for numbers instead of `\w` – Alexander Derck Feb 05 '16 at 07:48
  • 1
    [`(\([^)]+\)|\d+|\D)`](https://regex101.com/r/eQ9fB3/1) – Tushar Feb 05 '16 at 07:49
  • 2
    i presume it is a math expression you need to parse ? , there are a lot of librarys that do this for you take a look at this http://stackoverflow.com/questions/3972854/parse-math-expression – Thorarins Feb 05 '16 at 08:00

2 Answers2

1

A regular expression isn't exactly the best tool for the job. You can achieve what you want with them, but you'll have to jump through hoops.

The first one being nested constructs like 45+((10 + 20)*2)<=50, so let's start working on that first, as \(.*\) won't do you any good. It's eager and unaware of nested constructs.

Here's a better pattern for parentheses only:

(?>
    (?<p>\()
    |(?<-p>\))
    |(?(p)[^()])
)+
(?(p)(?!))

Yes, that's what it takes. Read about balancing groups for an in-depth explanation of this.

Numerical terms would be matched by \d+ or [0-9]+ (for ASCII only digits in .NET), not by \w+.

As for your question:

A numerical or parenthetical term followed by an operational term, and that whole thing repeated any number of times, ending in a numerical or parenthetical term

You're trying to do it wrong. While you could do just that with PCRE regexes, it'll be much harder in .NET.

You can use regexes for lexing (aka tokenizing). But then use application code to make sense of the tokens the regex returns you. Don't use regex for semantics, you won't end up with pretty code.

Perhaps you should use an existing math parsing library, such as NCalc.

Or you may need to go with a custom solution and build your own parser...

Community
  • 1
  • 1
Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
0

I hope the below regex will give what you expect

(([><!=]{1}[=]{0,1})|[\+\-\*\/]{1}|\(.*\)|[\d]*| *)
Thanga
  • 7,811
  • 3
  • 19
  • 38