Regex for getting everything inside a div including divs

Question

I have to get all the content of a div of a class in php, and I have this:

<div class="main">(.*?)</div>

But I have problems with div children.

<div class="main">asdasd<div>jkjk</div></div>

The result for that is:

<div class="main">asdasd<div>jkjk</div>

I'm trying with (?(?=regex)then|else) but I can't make it work ¯\_(ツ)_/¯

Regex is not the right solution for parsing html, I doubt this is even possible with just regex. — musefan, Jan 30 '15 at 10:59
Regex is not the correct tool for this. Googeling "contents of div php" quickly leads to http://stackoverflow.com/questions/6491598/how-can-i-get-a-div-content-in-php. I suggest you attempt the methods described there. — Taemyr, Jan 30 '15 at 11:01
Besides the usual "don't parse html with regex": for your specific example (and most likely only this), just remove the `?` to make the quantifier greedy. Furthermore, no wonder `(?(?=regex)then|else)` won't work, I doubt you want to match `then` or `else` ;-) Feel free to show us what you have really tried using this construct. — KeyNone, Jan 30 '15 at 11:05

Taemyr · Answer 1 · 2015-01-30T11:29:47.353

Regexp started out as a tool to match regular languages.

Regular languages strikes a fairly good balance between effecient recognization algorithms and expressiveness. It's easy to think that regular languages allows you to detect all interesting substrings.

However there are limitations to regular languages. Of particular relevance for your problem is the fact that the language of matched paranthesises is not regular. - This means that no regular expression exists that matches the language of matched paranthesises.

This would be the end of the discussion except for the following; over time the language of regexp have expanded in ways that increases it's expressive power beyond regular languages. In particular PHP offers the recursive regexp operator (?R), that will allow you to search for matching paranthesises, or matching <div>, and </div> tags.

You could look into the syntax of this operator and adapt it for your needs. - You would however be wasting your time. Parsing html is a solved problem and using a DOM parser will be more robust, easier to extend, and easier to understand for other coders or for yourself when you return to your code later.

score 2 · Accepted Answer · answered Jan 30 '15 at 11:25

2

You should not parse html with regex.It is bound to fail somewhere.For your problem you can use Recursive feature of php.

<div\b(?:(?R)|(?:(?!<\/?div).))*<\/div>

See demo.

https://regex101.com/r/vD5iH9/15

answered Jan 30 '15 at 11:25

vks

67,027
10
91
124

Regex for getting everything inside a div including divs

2 Answers2