0

What I'm seeking to do is find an elegant solution to remove the contents of everything between a certain class = i.e. you want to remove all the HTML in the sometestclass class using php.

The function below works somewhat - not that well - it removes some parts of the page I don't want removed. Below is a function based on an original post (below):

$html = "<p>Hello World</p>
         <div class='sometestclass'>
           <img src='foo.png'/>
           <div>Bar</div>
         </div>";
$clean = removeDiv ($html,'sometestclass');
echo $clean;

function removeDiv ($html,$removeClass){
$dom = new DOMDocument;
$dom->loadHTML( $html );

$xpath = new DOMXPath( $dom );
$removeString = ".//div[@class='$removeClass']";
$pDivs = $xpath->query($removeString);

foreach ( $pDivs as $div ) {
  $div->parentNode->removeChild( $div );
}

$output = preg_replace( "/.*<body>(.*)<\/body>.*/s", "$1", $dom->saveHTML() );
return $output;
}

does anyone have any suggestions to improve the results of this?

the original post is here

Community
  • 1
  • 1
Viktor
  • 517
  • 5
  • 23
  • Do you want to remove the content of all divs? Or just divs with a specified class? – Patrick Q Mar 17 '14 at 15:54
  • I just want to remove a comment box from a BLOG - its enclosed in a
    lots of stufff
    so what i'm really looking for is a simpler way to do this. It would scan the file for the first instance of the 'commentbox' class and then remove everything including all the other nested in between it. Does that make sense? I don't want to touch the rest of the page and I believe the
    only appears ONCE on the page.
    – Viktor Mar 17 '14 at 16:17

1 Answers1

1

You are not quoting the class name:

$removeString = ".//div[@class=$removeClass]";

should be:

$removeString = ".//div[@class='$removeClass']";
jeroen
  • 91,079
  • 21
  • 114
  • 132
  • that worked... thanks... the code is still buggy - it works for very simple HTML but it creates warnings then I run it for real on a page. – Viktor Mar 17 '14 at 16:13
  • @Viktor What kind of warnings? – jeroen Mar 17 '14 at 16:21
  • Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag header invalid in Entity, line: 87 in /home/content/07/9157607/html/test.php on line 81 Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag g:plusone invalid in Entity, line: 174 in /home/content/07/9157607/html/test.php on line 81 those are the warnings - but it also removes the Google+1 button from the page when I run the script - I could suppress the warnings, but that problem still exists that it is altering other sections of the page. – Viktor Mar 17 '14 at 16:27
  • @Viktor If your html is not valid, you could try to put it through for example Tidy or HTML Purifier. – jeroen Mar 17 '14 at 16:29
  • Thanks but that's not an option. This function is part of a page rewriting script - we fetch the page from the web and fix the html. We're writing something like Tidy but it is specific to a blog that has a comment box that doesn't work. All we want to do is fetch the contents remove the comment box and be happily on our way. – Viktor Mar 17 '14 at 16:39