I'm trying to learn how to use PHP's DOM functions. As an exercise, I want to repair an invalid HTML fragment. So far, I've been able to produce a full document:
<?php
$fragment = '<div style="font-weight: bold">Lorem ipsum <div>dolor sit amet,
<strong><em class=foo>luptate</strong></em>. Excepteur proident,
<div class="bar">sunt in culpa</div> officia est laborum.';
$doc = new DOMDocument;
libxml_use_internal_errors(TRUE);
$doc->loadHTML($fragment);
libxml_use_internal_errors(FALSE);
$doc->formatOutput = TRUE;
echo $doc->saveHTML();
?>
... which prints:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><div style="font-weight: bold">Lorem ipsum <div>dolor sit amet,
<strong><em class="foo">luptate</em></strong>. Excepteur proident,
<div class="bar">sunt in culpa</div> officia est laborum.</div>
</div></body></html>
My questions:
- Is there a way to print only the HTML that corresponds to the original fragment?
- Is there a more appropriate built-in library for such task?