1

I would like to convert a string that is formatted as an infix mathematical to an array of tokens, using regular expressions. I'm very new to regular expressions, so forgive me if the answer to this question turns out to be too trivial

For example:

"31+2--3*43.8/1%(1*2)" -> ["31", "+", "2", "-", "-3", "*", "43.8", "/", "1", "%", "(", "*", "2", ")"]

I've already implemented a method that achieves this task, however, it consists of many lines of code and a few nested loops. I figured that when I define more operators/functions that may even consist of multiple characters, such as log or cos, it would be easier to edit a regex string rather than adding many more lines of code to my working function. Are regular expressions the right job for this, and if so, where am I going wrong? Or am I better off adding to my working parser?

I've already referred to the following SO posts:

How to split a string, but also keep the delimiters?

This one was very helpful, but I don't believe I'm using 'lookahead' correctly.

Validate mathematical expressions using regular expression?

The solution to the question above doesn't convert the string into an array of tokens. Rather, it checks to see if the given string is a valid mathematical expression.

My code is as follows:

func convertToInfixTokens(expression: String) -> [String]?
{
    do
    {
        let pattern = "^(((?=[+-/*]))(-)?\\d+(\\.\\d+)?)*"

        let regex = try NSRegularExpression(pattern: pattern)

        let results = regex.matches(in: expression, range: NSRange(expression.startIndex..., in: expression))

        return results.map
        {
            String(expression[Range($0.range, in: expression)!])
        }
    }
    catch
    {
        return nil
    }
}

When I do pass a valid infix expression to this function, it returns nil. Where am I going wrong with my regex string?

NOTE: I haven't even gotten to the point of trying to parse parentheses as individual tokens. I'm still figuring out why it won't work on this expression:

"-99+44+2+-3/3.2-6"

Any feedback is appreciated, thanks!

Rohan
  • 359
  • 2
  • 16

1 Answers1

2

Your pattern does not work because it only matches text at the start of the string (see ^ anchor), then the (?=[+-/*]) positive lookahead requires the first char to be an operator from the specified set but the only operator that you consume is an optional -. So, when * tries to match the enclosed pattern sequence the second time with -99+44+2+-3/3.2-6, it sees +44 and -?\d fails to match it (as it does not know how to match + with -?).

Here is how your regex matches the string:

enter image description here

You may tokenize the expression using

let pattern = "(?<!\\d)-?\\d+(?:\\.\\d+)?|[-+*/%()]"

See the regex demo

Details

  • (?<!\d) - there should be no digit immediately to the left of the current position
  • -? - an optional -
  • \d+ - 1 or more digits
  • (?:\.\d+)? - an optional sequence of . and 1+ digits
  • | - or
  • \D - any char but a digit.

Output using your function:

Optional(["31", "+", "2", "-", "-3", "*", "43.8", "/", "1", "%", "(", "1", "*", "2", ")"])
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Very nice answer. Thank you! – Rohan Mar 05 '19 at 17:27
  • I edited the regex string to parse "-" as a unary operator only if what precedes it is not a digit. In addition, I edited the last part so it checks for an operator/parentheses, not any char but a digit. For my tests, this passed. Lmk if you see a problem with it, though. Thanks! – Rohan Mar 05 '19 at 18:03
  • @Rohan Yes, that is fine, I added more details to explanation. Note there is no need adding a capturing group. If you need to only restrict the minus, you may group the lookbehind with it, `let pattern = "(?:(?<!\\d)-)?\\d+(?:\\.\\d+)?|[-+*/%()]"`, but it will work the same as your fix. – Wiktor Stribiżew Mar 05 '19 at 18:29