1

I'd like to remove <font> tags from my html and am trying to use replaceChild to do so, but it doesn't seem to work properly. Can anyone catch what might be wrong?

$html = '<html><body><br><font class="heading2">Limited Size and Resources</font><p><br><strong>Q: When can a member use the limited size and resources exception?</strong></p></body></html>';

$dom = new DOMDocument();
$dom->loadHTML($html);
$font_tags = $dom->GetElementsByTagName('font');

foreach($font_tags as $font_tag) {
  foreach($font_tag as $child) {
    $child->replaceChild($child->nodeValue, $font_tag);
  }
}

echo $dom->saveHTML();

From what I understand, $font_tags is a DOMNodeList, so I need to iterate through it twice in order to use the DOMNode::replaceChild function. I then want to replace the current value with just the content inside of the tags. However, when I output the $html nothing changes. Any ideas what could be wrong?

Here is a PHP Sandbox to test the code.

neuquen
  • 3,991
  • 15
  • 58
  • 78

3 Answers3

3

I'll put my remarks inline

$html = '<html><body><br><font class="heading2">Limited Size and Resources</font><p><br><strong>Q: When can a member use the limited size and resources exception?</strong></p></body></html>';

$dom = new DOMDocument();
$dom->loadHTML($html);
$font_tags = $dom->GetElementsByTagName('font');

/* You only need one loop, as it is iterating your collection 
   You would only need a second loop if each font tag had children of their own
*/
foreach($font_tags as $font_tag) {
  /* replaceChild replaces children of the node being called
     So, to replace the font tag, call the function on its parent
     $prent will be that reference
  */
  $prent = $font_tag->parentNode;
   /* You can't insert arbitrary text, you have to create a textNode
      That textNode must also be a member of your document
   */
  $prent->replaceChild($dom->createTextNode($font_tag->nodeValue), $font_tag);

}

echo $dom->saveHTML();

Updated Sandbox: Hopefully I understood your requirements correctly

Gary
  • 13,303
  • 18
  • 49
  • 71
  • Man! I was so close to this solution before I made a few steps back. I had `$font_tag->parentNode->replaceChild($font_tag->nodeValue, $font_tag);` but kept getting the error 'Argument 1 passed to DOMNode::replaceChild() must be an instance of DOMNode' which is why I thought I had to iterate twice. In this case, I should have just added `createTextNode`... – neuquen Sep 02 '14 at 21:37
  • It's not hard to get close to a real answer then wind up 10 miles away. So I take it this did what you were looking for? – Gary Sep 02 '14 at 21:38
  • Still testing, but I will definitely mark as correct if it works. Thanks! – neuquen Sep 02 '14 at 21:40
0

First, let us find out what wasn't working in your code.

  1. foreach($font_tag as $child) wasn't even iterating once as $font_tag is a single 'font' tag element from font_tags array, and not an array itself.

  2. $child->replaceChild($child->nodeValue, $font_tag); - A child node can't replace its parent ($font_tag), but the reverse is possible. As replaceChild is a method of the parent node to replace its child.
    For more details check the PHP: DOMNode::replaceChild documentation, or the point 2 below my code.

  3. echo $html will output the $html string, but not the updated $dom object that we are modifying.


This would work -

$html = '<html><body><br><font class="heading2">Limited Size and Resources</font><p><br><strong>Q: When can a member use the limited size and resources exception?</strong></p></body></html>';

$dom = new DOMDocument();
$dom->loadHTML($html);
$font_tags = $dom->GetElementsByTagName('font');

foreach($font_tags as $font_tag)
{
    $new_node = $dom->createTextNode($font_tag->nodeValue);
    $font_tag->parentNode->replaceChild($new_node, $font_tag);
}

echo $dom->saveHTML();
  1. I am creating a $new_node directly in the $dom, so the node is live in the DOMDocument and not any local variable.

  2. To replace the child object $font_tag, we have to first traverse to the parent node using the parentNode method.

  3. Finally, we are printing out the modified $dom using saveHTML method, which will convert the DOMDocument into a HTML String.

Chique
  • 735
  • 3
  • 15
0

Remove a specific span tag from HTML while preserving/keeping the inside content using PHP and DOMDocument

<?php

$content = '<span style="font-family: helvetica; font-size: 12pt;"><div>asdf</div><span>TWO</span>Business owners are fearful of leading. They would rather follow the leader than embrace a bold move that challenges their confidence. </span>';

$dom = new DOMDocument();
// Use LIBXML for preventing output of doctype, <html>, and <body> tags
$dom->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

$xpath = new DOMXPath($dom);

foreach ($xpath->query('//span[@style="font-family: helvetica; font-size: 12pt;"]') as $span) {

    // Move all span tag content to its parent node just before it.
    while ($span->hasChildNodes()) {
        $child = $span->removeChild($span->firstChild);
        $span->parentNode->insertBefore($child, $span);
    }

    // Remove the span tag.
    $span->parentNode->removeChild($span);
}

// Get the final HTML with span tags stripped
$output = $dom->saveHTML();

print_r($output);
MRMP
  • 303
  • 2
  • 5