3

given the following string in PHP:

$html = "<div>
<p><span class='test1 test2 test3'>text 1</span></p>
<p><span class='test1 test2'>text 2</span></p>
<p><span class='test1'>text 3</span></p>
<p><span class='test1 test3 test2'>text 4</span></p>
</div>";

I just want to either empty or remove any class that has "test2" in it, so the result would be this:

<div>
<p><span class=''>text 1</span></p>
<p><span class=''>text 2</span></p>
<p><span class='test1'>text 3</span></p>
<p><span class=''>text 4</span></p>
</div>

of if you're removing the element:

<div>
<p>text 1</p>
<p>text 2</p>
<p><span class='test1'>text 3</span></p>
<p>text 4</p>
</div>

I'm happy to use a regex expression or something like PHP Simple HTML DOM Parser, but I have no clue how to use it. And with regex, I know how to find the element, but not the specific attribute associated w/ it, especially if there are multiple attributes like my example above. Any ideas?

James Nine
  • 2,548
  • 10
  • 36
  • 53

4 Answers4

6

The DOMDocument class is a very straight-forward and easy-to-understand interface designed to assist you in working with your data in a DOM-like fashion. Querying your DOM with xpath selectors should be the task(s) all the more trivial:

Clear All Classes

// Build our DOMDocument, and load our HTML
$doc = new DOMDocument();
$doc->loadHTML($html);

// Preserve a reference to our DIV container
$div = $doc->getElementsByTagName("div")->item(0);

// New-up an instance of our DOMXPath class
$xpath = new DOMXPath($doc);

// Find all elements whose class attribute has test2
$elements = $xpath->query("//*[contains(@class,'test2')]");

// Cycle over each, remove attribute 'class'
foreach ($elements as $element) {
    // Empty out the class attribute value
    $element->attributes->getNamedItem("class")->nodeValue = '';
    // Or remove the attribute entirely
    // $element->removeAttribute("class");
}

// Output the HTML of our container
echo $doc->saveHTML($div);
Community
  • 1
  • 1
Sampson
  • 265,109
  • 74
  • 539
  • 565
4

using the PHP Simple HTML DOM Parser

Updated and tested! You can get the simple_html_dom.php include from the above link or here.

for both cases:

include('../simple_html_dom.php');

$html = str_get_html("<div><p><span class='test1 test2 test3'>text 1</span></p>
<p><span class='test1 test2'>text 2</span></p>
<p><span class='test1'>text 3</span></p>
<p><span class='test1 test3 test2'>text 4</span></p></div>");

case 1:

foreach($html->find('span[class*="test2"]') as $e)
$e->class = '';

echo $html;

case 2:

foreach($html->find('span[class*="test2"]') as $e)
$e->parent()->innertext = $e->plaintext;

echo $html;
Josh
  • 6,256
  • 2
  • 37
  • 56
  • case 1 throws: "Warning: Attempt to assign property of non-object" case 2 throws: Parse error: syntax error, unexpected '[', expecting ')' Am I doing something wrong? I started it with: $html = new simple_html_dom(); $html->load( ... the html string above ... ); – James Nine Jan 21 '10 at 11:32
  • what version of php are you running? – Josh Jan 21 '10 at 14:08
  • sorry - i've updated and tested the code - it is now working. i think this method is much easier to read what is going on. – Josh Jan 23 '10 at 21:44
  • having used jquery quite a lot i like the similar easy syntax the PHP Simple HTML DOM Parser uses. not sure though about the overhead it causes but for small/medium sites i think its really easy to use. – Josh Jan 23 '10 at 22:17
3
$notest2 = preg_replace(
         "/class\s*=\s*'[^\']*test2[^\']*'/", 
         "class=''", 
         $src);

C.

symcbean
  • 47,736
  • 6
  • 59
  • 94
  • Don't use regex to parse html :( We have been over this a gazillion times!! Please look at http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – AntonioCS Jan 23 '10 at 21:51
  • @kemp: That's the wrong way to think when you are trying to do things. Do stuff the right way and probably you won't have any problems in the future, do them in which ever manner works and it will come back to bite you in the butt – AntonioCS Jan 23 '10 at 22:01
  • 6
    I just don't get this holy war: this **is not** parsing HTML, it's a simple text search and replace. The "right way" doesn't exist, it totally depends on the context. – Matteo Riva Jan 23 '10 at 22:09
1

You can use any DOM Parser, iterate over every element. Check whether its class attribute contains test2 class (strpos()) if so then set empty string as a value for class attribute.

You can also use regular expressions to do that - much shorter way. Just find and replace (preg_replace()) using the following expression: #class=".*?test2.*?"#is

Crozin
  • 43,890
  • 13
  • 88
  • 135
  • I tried: preg_replace('#class=".*?test2.*?"#is', "", $html); but that did not work; did I do it wrong? – James Nine Jan 21 '10 at 11:26
  • 1
    It should be `$html = ....`. But use solution proposed be Josh - it's better (I've forgotten that we can so easily search for interesting elements). – Crozin Jan 21 '10 at 11:51
  • yep, i did "$html =" in the beginning of course. i'll have a look at Josh's answer though. – James Nine Jan 21 '10 at 12:06