I've had a look and there don't seem to be any old questions that directly address this. I also haven't found a clear solution anywhere else.
I need a way to match a tag, open to close, and return everything enclosed by the tag. The regexes I've tried have problems when tags are nested. For example, the regex <tag\b[^>]*>(.*?)</tag>
will cause trouble with <tag>Some text <tag>that is nested</tag> in tags</tag>
. It will match <tag>Some text <tag>that is nested</tag>
.
I'm looking a solution to this. Ideally an efficient one. I've seen solutions that involve matching on start and end tags separately and keeping track of their index in the content to work out which tags go together but that seems wildly inefficient to me (if it's the only possible way then c'est la vie).
The solution must be PHP only as this is the language I have to work with. I'm parsing html snippets (think body sections from a wordpress blog and you're not too far off). If there is a better than regex solution, I'm all ears!
UPDATE:
Just to make it clear, I'm aware regexes are a poor solution but I have to do it somehow which is why the title specifically mentions better solutions.
FURTHER UPDATE:
I'm parsing snippets. Solutions should take this into account. If the parser only works on a full document or is going to add <head>
etc... when I get the html back out, it's not an acceptable solution.