0

I created a regular expression:

((\(\s*)          #match start parens
|(\d+\.?\d*)      #match a number
|([+-/%^!])       #match operators
|([*])            #match the asterisk operator
|(\s*\)))+        #match the end parens

that is supposed to separate parentheses, numbers (integers and decimal (3 and 6.28)), and operators (+-/*^%!). I have tried a few tests

( (2 3 +) 6.28 +)  
(3.14 6.28 +)
( (3 4 +) (5 6 +) *)

and I have noticed a few things. When I run the regular expression on expressions with two start parens, it seems to ignore one of the parentheses, and testing on the site seems to yield many instances of null and repetition of characters. Is there a way to match a valid expression and assign that to it's own group? For example, if I have the expression ( (2 3 +) 6.28 +), the groups generated would be: [(, (, 2, 3, +, 6.28, +, )]?

I remember one user posted an answer here that used a python regular expression, and it worked like a charm. The expression used something like (?.) or (.?) and the rest of my expressions. Unfortunately, I neglected to copy it down and the answer has been deleted. After that I have tried tweaking it quite a bit but nothing has worked. Any extra help is appreciated.

zx81
  • 41,100
  • 9
  • 89
  • 105
swarajd
  • 977
  • 1
  • 10
  • 18
  • 2
    Regular expressions are not well suited for parsing recursive structures like this. – p.s.w.g Jan 14 '14 at 17:01
  • At your last example, 2 opening parenthesis and one closing? – revo Jan 14 '14 at 17:08
  • This can be useful: http://activedeveloper.info/rpn-calculator-in-php.html At least if you need an universal reverse polish calculator. This is in php, but it can be good for an example anyway. – Lajos Veres Jan 14 '14 at 17:09

1 Answers1

2

Just for the sport, I looked at your question both for the RPN case and the non-RPN case. Here are the recursive regexes I came up with (Perl, PCRE, Python with regex module), with a couple of caveats:

  1. I worked on the matching part of your question, not the tokenizing part.
  2. I didn't tune them, so please only take them as a starting point.

RPN

^(\((?![ ]*[+/*-])(?:[ ]*|[+/*-](?![ ]*[+/*-])|(?:[ ]*\d+)?[ ]*\d+[ ]*[+/*-](?![ ]*[+/*-])[ ]*|(?1))*\))

On the demo, you can see some sample strings that it matches, as well as "improper strings" where it fails.

Non-RPN

^(\((?![*\/])(?:\d++|[+*\/-](?![+*\/-])|(?1))*(?<![+*\/-])\))

On the demo, you can see some sample strings that it matches, as well as "improper strings" where it fails.

Discussion

  • There are probably cases I haven't thought of where the patterns allow improper operations. I did this fast. If you find one, let me know and we'll look for a fix. :)
  • I didn't deal with decimals—a simple addition, leaving that for you to do if you choose to.
  • In RPN I haven't dealt with negative numbers (which typically have a special key on the calculator): instead, you can -1* for instance
  • For the sake of tidiness, on the non-RPN one I have not allowed any spaces, but it wouldn't be hard to do so.

Related

When no parentheses are required, things are easier. In this related question, we're able to use a short-and-tidy

^\s*-?\d+(?:\s*[-+*/]\s*\d+\s*)+$
Community
  • 1
  • 1
zx81
  • 41,100
  • 9
  • 89
  • 105
  • I cant get this to work in anything other than the PHP version your demo runs in. Any chance you have a JS or Ruby version that works? – Automatico Oct 27 '14 at 17:52