0

For a certain Java homework of mine I've been tasked with evaluating an arithmetic expression and deciding whether or not it is a valid expression. The expressions contain 3 types of brackets ( {},[],() ), digits,whitespace, and the +-* and / operators. However, the strings could also contain random junk in which case I shouldnt parse the expression at all.

I have implemented the following regex, and hopefully it will work:

Pattern p = Pattern.compile("[^0-9\\[-\\]\\{-\\}\\(-\\)+*\\/\\s-]");

However, I am not very familiar with regex and was wondering if anyone could take a second look?

Bot Fred
  • 37
  • 6
  • If nested brackets are allowed, regular expressions are unlikely to help validate the expression: [Can regex match nested patterns](https://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns), [regular expression to match balanced parentheses](https://stackoverflow.com/questions/546433/regular-expression-to-match-balanced-parentheses) – Nowhere Man Nov 11 '21 at 22:32
  • The goal is not to match brackets, just to verify that all the characters in the string are composed of bracket characters, digits, or operators. I will match brackets seperately using stacks of chars. – Bot Fred Nov 12 '21 at 01:02

2 Answers2

2

The suggested syntax isn't regular. The 'regular' in 'Regular Expression' isn't just an arbitrary word, nor is it the last name of its inventor. It's a description of a subset of all imaginable syntaxes.

If a syntax isn't regular, a regex can't parse it properly. Many, many things aren't regular. Basic math (pretty much everything with hierarchy/recursive elements) isn't.

You can't use regexes to parse this stuff.

It sounds like your plan is: "Let's first reject invalid inputs, and then I'll start thinking about where to go from there". This is the wrong approach, you've got it twisted around. Start parsing the input; if it's invalid you'll figure it out trivially as you process it.

The answer to 'parse e.g. 5 + (2 * {3 * 10}) * [6 / 2]' does not involve regular expressions in any way.

note that \\{-\\} means: "All characters whose unicode values lie between the unicode value of the { char and the unicode value of the } char are valid", which you didn't mean there - you wanted just \\{\\} (as in: The { character and the } character). Once you fix this, your regex will correctly 'detect' any characters that are flat out invalid, such as the letter 'a'. However, it will allow a great many things that aren't valid, such as:

  • (
  • {} * ()
  • 5+++2
  • ---5555---
  • ///
  • (5+2}
  • 99999999999999999999999999 (that's more than fits in int)
  • 5 2
  • {(5 + 2) * (3 + [5 + 2)}

you can't write a regexp that properly denies all these.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
  • right! After I've thrown out any blatanly invalid ones, I will use Djikstra's Algorithm with two stacks to evaluate the arithmetic. I am not trying to do so with just regex :D I just need to see if that particular line is worth parsing and not a junk string, first. – Bot Fred Nov 12 '21 at 00:57
  • The reason I am doing this is that the homework states to println "Invalid Parse" for any junk strings, and to attempt to evaluate any invalid strings made up of the proper characters. If it follows proper math convention, do something, if not do something else.. – Bot Fred Nov 12 '21 at 01:00
  • You _cant_ do it with regexp. I'm not sure how I can spell out __impossible__ any better. You can trivially recognize the thing is gobbledygook as part of your dijkstra's. – rzwitserloot Nov 12 '21 at 17:31
1

The regular expression to validate just the mentioned set of characters may be simplified and used with String::matches:

// return true if expression contains only valid characters, false otherwise
private static boolean hasValidChars(String expr) {
    return expr.matches("[-+*/0-9\\s(){}\\[\\]]*");
}

For the given set of characters, only square brackets [, ]need to be escaped while used inside character range, - should not be escaped being the first character in the range.


If the regular expression should return true if invalid character is detected, existing expression should be negated:

private static boolean hasInvalidChars(String expr) {
    return expr.matches("(?![-+*/0-9\\s(){}\\[\\]]*$).*");
}

or the character set could be negated [^...] (.* need to be supplied to look for any invalid character in the expression, as matches checks the entire string.)

private static boolean hasInvalidChars(String expr) {
    return expr.matches(".*([^-+*/\\s0-9(){}\\[\\]]).*");
}

Tests:

for (String expr : Arrays.asList("(123 + 456) - 789", "abc + 322", "(33 / [11 - x])")) {
    System.out.println(expr);
    System.out.println("invalid? " + hasInvalidChars(expr));
    System.out.println("valid? " + hasValidChars(expr));
    System.out.println("---");
}

Output:

(123 + 456) - 789
invalid? false
valid? true
---
abc + 322
invalid? true
valid? false
---
(33 / [11 - x])
invalid? true
valid? false
---
Nowhere Man
  • 19,170
  • 9
  • 17
  • 42