Regexp started out as a tool to match regular languages.
Regular languages strikes a fairly good balance between effecient recognization algorithms and expressiveness. It's easy to think that regular languages allows you to detect all interesting substrings.
However there are limitations to regular languages. Of particular relevance for your problem is the fact that the language of matched paranthesises is not regular. - This means that no regular expression exists that matches the language of matched paranthesises.
This would be the end of the discussion except for the following; over time the language of regexp have expanded in ways that increases it's expressive power beyond regular languages. In particular PHP offers the recursive regexp operator (?R)
, that will allow you to search for matching paranthesises, or matching <div>
, and </div>
tags.
You could look into the syntax of this operator and adapt it for your needs. - You would however be wasting your time. Parsing html is a solved problem and using a DOM parser will be more robust, easier to extend, and easier to understand for other coders or for yourself when you return to your code later.