regex match html element with html children

Question

Possible Duplicate:
How to parse and process HTML with PHP?

I wasn't sure how to phrase this question.

Basically I have this php code:

$new_html = preg_replace('!<div.*?id="spotlight".*?>.*?</div>!is', '', $html);

I want this to change html code from this (example, not actual html):

<div id="container">
    <div id="spotlight">
        <!-- empty -->
    </div>
    <div id="content">
        <!-- lots of content -->
    </div>
</div>

To this:

<div id="container">
    <div id="content">
        <!-- lots of content -->
    </div>
</div>

As you can see the php code will do this successfully, because the regex is looking for:

<div{anything}id="spotlight"{anything}>{anything}</div>

However

if the div id="spotlight" contains a child div like so:

<div id="container">
    <div id="spotlight">
        <div></div>
    </div>
    <div id="content">
        <!-- lots of content -->
    </div>
</div>

then the regex will match the end div tag of the child div!

How do i prevent this? How to i tell regex to ignore the closing div if another div was opened?

Thanks

Do you have control in the code? If so just edit it! If not you cannot be writing PHP to process it. — Ed Heal, Sep 24 '12 at 12:29
Theres no way i can edit the code directly. This MUST be done using regex — AlexMorley-Finch, Sep 24 '12 at 12:30
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 — Kris, Sep 24 '12 at 12:34
Please refrain from parsing HTML with RegEx as it will [drive you į̷̷͚̤̤̖̱̦͍͗̒̈̅̄̎n̨͖͓̹͍͎͔͈̝̲͐ͪ͛̃̄͛ṣ̷̵̞̦ͤ̅̉̋ͪ͑͛ͥ͜a̷̘͖̮͔͎͛̇̏̒͆̆͘n͇͔̤̼͙̩͖̭ͤ͋̉͌͟eͥ͒͆ͧͨ̽͞҉̹͍̳̻͢](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use an [HTML parser](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) instead. — Madara's Ghost, Sep 24 '12 at 12:40
Using regex is not the best way to do this. You should be using a Dom parser like [DomDocument](http://php.net/manual/en/class.domdocument.php). It will make your life easier and your code will be easier to maintain / extend. — Wayne Whitty, Sep 24 '12 at 12:29
If you're actually looking for a regular expression, this should give you some pointers: http://kore-nordmann.de/blog/parse_with_regexp.html - but, well, you must really want to eat the pain then. — hakre, Sep 24 '12 at 13:44

Mihai Iorga · Answer 1 · 2012-09-24T12:40:48.983

Use DOMDocument:

$html = '<div id="container">
    <div id="spotlight">
        <!-- empty -->
    </div>
    <div id="content">
        <!-- lots of content -->
    </div>
</div>';

$dom = new DOMDocument;
$dom->loadXML($html);

$xpath = new DOMXPath($dom);
$query = '//div[@id="spotlight"]';
$entries = $xpath->query($query);

foreach($entries as $one){

    $one->parentNode->removeChild($one);
}

echo $dom->saveHTML();

Codepad Example

score 0 · Answer 2 · answered Sep 24 '12 at 12:34

0

$a = preg_replace('/<div[^>]+>\\s+<\/div>/', '', $a);

answered Sep 24 '12 at 12:34

ZiTAL

3,466
8
35
50

regex match html element with html children

2 Answers2

Linked