0

Possible Duplicate:
How to insert HTML to PHP DOMNode?

Continuing off of this question, here's the code I'm about to start using:

function getHTMLDOMSegment( $file )
{
    $doc = new DOMDocument();
    $doc->loadHTMLFile( $file );
    $body = $dom->getElementsByTagName('body')->item(0);
    return $body->childNodes;
}

Then I'd simply iterate over the children, using importNode and append each wherever they need to go in another HTML-loaded DOMDocument.

Does this sound right?

Edit: To be clear, since the source files I'm working with may not be "proper" XHTML, I need to go through loadHTMLFile like this no matter what, apparently.

Also, because this might be having to work with a large amount of HTML content, my goal is to be efficient as well.

Community
  • 1
  • 1
Hamster
  • 2,962
  • 7
  • 27
  • 38

1 Answers1

2

I am somewhat reluctant to answer this question because it is basically just a longer version of what Artefacto and I already told you, but anyway.

You can either add the raw XML subtree and add it through a fragment

Or use

Note that you can deep import entire trees by passing TRUE as the second argument.

Examples:

$a = <<< HTML
<div>
    <h2>Hello World</h2>
    <p> I am from the <abbr title="source">src</abbr> document<br/></p>
</div>
HTML;

$b = <<< HTML
<h1>Importing Fragments</h1>
HTML;

Using fragments:

$dest = new DOMDocument;
$dest->loadHTML($b);
$fragment = $dest->createDocumentFragment();
$fragment->appendXML($a);
$dest->getElementsByTagName('h1')->item(0)->appendChild($fragment);
echo $dest->saveHTML();

Using imports:

$dest = new DOMDocument;
$dest->loadHTML($b);
$src = new DOMDocument;
$src->loadHTML($a);
$dest->getElementsByTagName('h1')->item(0)->appendChild(
    $dest->importNode(
        $src->getElementsByTagName('div')->item(0),
        TRUE
    )
);
echo $dest->saveHTML();

The output for both of these would be something like

<!DOCTYPE html PUBLIC 
    "-//W3C//DTD HTML 4.0 Transitional//EN" 
    "http://www.w3.org/TR/REC-html40/loose.dtd">

<html><body><h1>Importing Fragments<div>
    <h2>Hello World</h2>
    <p> I am from the <abbr title="source">src</abbr> document<br></p>
</div></h1></body></html>

If you are concerned about which one performs better, I suggest to benchmark these under real world conditions, e.g. in your application. Then judge for yourself which suits your needs.

Gordon
  • 312,688
  • 75
  • 539
  • 559
  • Serializing to a string and then back into a DOM data structure seems like it might be inefficient, though. At least, that's how I understand what appendXML does from its description... – Hamster Jan 19 '11 at 15:26
  • Also, by subtree you mean the descendants of the each of the body's immediate child elements? – Hamster Jan 19 '11 at 15:27
  • @Hamster You didnt answer what you consider efficient in the last question already. Until you clarify that or give a benchmark I'd say it doesnt make a difference. By subtree I mean the entire node. In your specific UseCase it means you would indeed have to iterate if you dont want to add XML including the body. – Gordon Jan 19 '11 at 15:31
  • Efficient in terms of CPU and memory use, I'm guessing. I could be dealing with a lot of HTML content here, potentially. – Hamster Jan 19 '11 at 15:46
  • @Hamster Well, like I said. If you have any doubts, benchmark. I dont know if one performs better than the other. I've never noticed a significant overall impact on applications where I am using them. – Gordon Jan 19 '11 at 15:48