Regex to find unmatched parentheses

Question

I need a regular expression that can find any unmatched brace (opening or closing) in a string potentially containing matching parentheses.

The question exists here on stackoverflow, but I haven't found a regex-based solution that works.

I came up with a regex that finds unmatched open braces \((?![^)]+\)) using a negative lookahead, but I can't seem to figure out the opposite one required for unmatched closing braces.

EDIT: The above regex to find unmatched open braces doesn't work as intended. E.g. it will miss cases where multiple open braces are followed by a single closing brace (see also comments)

Here is my test string that I've been experimenting with on Rubular:

one) ((two) (three) four) (five)))

Note that the string can contain any type of character, including quotes, dashes, etc.

That is because there is no regex-based solution that works in the general case. It is the same reason that you [cannot parse XML with regex](http://stackoverflow.com/a/1732454/13) (again, in the general case). — C. K. Young, Feb 15 '12 at 20:09
Does it really have to be a regular expression? Why not a simple loop with `string.each_char { |c| ... }`? — David Grayson, Feb 15 '12 at 20:10
No strict regexp language can.. However common extended regular expressions like PCRE may be able to. See http://stackoverflow.com/questions/562606/regex-for-checking-if-a-string-has-mismatched-parentheses — Kaganar, Feb 15 '12 at 20:15
Your regex doesn't do what you think it does. It finds any `(` that does not have a `)` following it. For example `((2+3)` will show no unmatched `(` because both of them are followed at some point by `)` — David Kanarek, Feb 15 '12 at 20:16
David, you're right about that. The negative lookahead just ensures that there is a closing bracket _somewhere_ and would miss the case you mentioned. — Andrea Singh, Feb 16 '12 at 13:52

score 10 · Accepted Answer · answered Feb 15 '12 at 20:16

10

The short answer is that you can't find unmatched parentheses with regular expressions. Regular expressions encode regular languages, whereas the language of all properly matched parentheses is a context-free language.

answered Feb 15 '12 at 20:16

Lars Kotthoff

107,425
16
204
204

Phrogz · Answer 2 · 2012-02-15T20:45:17.083

Here's a sort-of-regex-based solution :)

def balanced?( str, open='(', close=')' )
  re = Regexp.new( "[\\#{open}\\#{close}]" )
  str.scan(re).inject(0) do |lv,c|
    break :overclosed if lv < 0
    lv + (c==open ? 1 : -1)
  end == 0
end

s1 = "one) ((two) (three) four) (five)))"
s2 = "((one) ((two) (three) four) (five))"
s3 = "((one) ((two) (three) four) (five)"

puts balanced?(s1), #=> false
     balanced?(s2), #=> true
     balanced?(s3)  #=> false

score 2 · Answer 3 · answered Feb 17 '12 at 21:03

Ruby's Oniguruma library can parse LALR(n) grammars, including HTML. Citing the README:

  r = Regexp.compile(<<'__REGEXP__'.strip, Regexp::EXTENDED)
  (?<element> \g<stag> \g<content>* \g<etag> ){0}
  (?<stag> < \g<name> \s* > ){0}
  (?<name> [a-zA-Z_:]+ ){0}
  (?<content> [^<&]+ (\g<element> | [^<&]+)* ){0}
  (?<etag> </ \k<name+1> >){0}
  \g<element>
  __REGEXP__

  p r.match('<foo>f<bar>bbb</bar>f</foo>').captures

The above code is of course much simpler than a real HTML parser, but it matches nested tags. Also, you should note that it is incredibly simple to make a regex which would be very slow (in the range of minutes to parse a 80-symbol string).

It's better to use a real parser like Treetop for this task.

Regex to find unmatched parentheses

3 Answers3