0

Possible Duplicate:
How to parse and process HTML with PHP?

I wasn't sure how to phrase this question.

Basically I have this php code:

$new_html = preg_replace('!<div.*?id="spotlight".*?>.*?</div>!is', '', $html);

I want this to change html code from this (example, not actual html):

<div id="container">
    <div id="spotlight">
        <!-- empty -->
    </div>
    <div id="content">
        <!-- lots of content -->
    </div>
</div>

To this:

<div id="container">
    <div id="content">
        <!-- lots of content -->
    </div>
</div>

As you can see the php code will do this successfully, because the regex is looking for:

<div{anything}id="spotlight"{anything}>{anything}</div>

However

if the div id="spotlight" contains a child div like so:

<div id="container">
    <div id="spotlight">
        <div></div>
    </div>
    <div id="content">
        <!-- lots of content -->
    </div>
</div>

then the regex will match the end div tag of the child div!

How do i prevent this? How to i tell regex to ignore the closing div if another div was opened?

Thanks

Community
  • 1
  • 1
AlexMorley-Finch
  • 6,785
  • 15
  • 68
  • 103
  • Do you have control in the code? If so just edit it! If not you cannot be writing PHP to process it. – Ed Heal Sep 24 '12 at 12:29
  • Theres no way i can edit the code directly. This MUST be done using regex – AlexMorley-Finch Sep 24 '12 at 12:30
  • http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Kris Sep 24 '12 at 12:34
  • You'd have to match an optional child div then – Ja͢ck Sep 24 '12 at 12:38
  • Please refrain from parsing HTML with RegEx as it will [drive you į̷̷͚̤̤̖̱̦͍͗̒̈̅̄̎n̨͖͓̹͍͎͔͈̝̲͐ͪ͛̃̄͛ṣ̷̵̞̦ͤ̅̉̋ͪ͑͛ͥ͜a̷̘͖̮͔͎͛̇̏̒͆̆͘n͇͔̤̼͙̩͖̭ͤ͋̉͌͟eͥ͒͆ͧͨ̽͞҉̹͍̳̻͢](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). Use an [HTML parser](http://stackoverflow.com/questions/292926/robust-mature-html-parser-for-php) instead. – Madara's Ghost Sep 24 '12 at 12:40
  • Can you afford `preg_replace_callback`? – Stan Sep 24 '12 at 12:53
  • 1
    Using regex is not the best way to do this. You should be using a Dom parser like [DomDocument](http://php.net/manual/en/class.domdocument.php). It will make your life easier and your code will be easier to maintain / extend. – Wayne Whitty Sep 24 '12 at 12:29
  • If you're actually looking for a regular expression, this should give you some pointers: http://kore-nordmann.de/blog/parse_with_regexp.html - but, well, you must really want to eat the pain then. – hakre Sep 24 '12 at 13:44

2 Answers2

2

Use DOMDocument:

$html = '<div id="container">
    <div id="spotlight">
        <!-- empty -->
    </div>
    <div id="content">
        <!-- lots of content -->
    </div>
</div>';

$dom = new DOMDocument;
$dom->loadXML($html);

$xpath = new DOMXPath($dom);
$query = '//div[@id="spotlight"]';
$entries = $xpath->query($query);

foreach($entries as $one){

    $one->parentNode->removeChild($one);
}

echo $dom->saveHTML();

Codepad Example

Mihai Iorga
  • 39,330
  • 16
  • 106
  • 107
0
$a = preg_replace('/<div[^>]+>\\s+<\/div>/', '', $a);
ZiTAL
  • 3,466
  • 8
  • 35
  • 50