3

It was suggested to me that in order to close some "dangling" HTML tags, I should use PHP's DOM extension and loadHTML.

I've been trying for a while, searching for tutorials, reading this page, trying various things, but can't seem to figure out how to use it to accomplish what I want.

I have this string: <div><p>The quick brown <a href="">fox jumps...

I need to write a function which closes the opened HTML tags.

Just looking for a starting point here. I can usually figure things out pretty quick.

Jeff
  • 5,962
  • 16
  • 49
  • 81

4 Answers4

3

Can be done with DOMDocument class within PHP using the DOMDocument::loadHTML() & DOMDocument::normalizeDocument() methods.

<?php
    $html = '<div><p>The quick brown <a href="">fox jumps';

    $DDoc = new DOMDocument();
    $DDoc->loadHTML($html);
    $DDoc->normalizeDocument();

    echo $DDoc->saveHTML();
?>

OutPuts:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd"> 
<html><body><div><p>The quick brown <a href="">fox jumps</a></p></div></body></html> 

From there, just substr & strpos away the html that you don't want, like so:

<?php
    $html = '<div><p>The quick brown <a href="">fox jumps';

    $DDoc = new DOMDocument();
    $DDoc->loadHTML($html);
    $DDoc->normalizeDocument();

    $html = $DDoc->saveHTML();

    # Remove Everything Before & Including The Opening HTML & Body Tags.
    $html = substr($html, strpos($html, '<html><body>') + 12);
    # Remove Everything After & Including The Closing HTML & Body Tags.
    $html = substr($html, 0, -14);

    echo $html;
?>
Mark Tomlin
  • 8,593
  • 11
  • 57
  • 72
2

OK, what about http://htmlpurifier.org/ ? Also http://www.bioinformatics.org/phplabware/internal_utilities/htmLawed/index.php

Can you use Tidy? http://php.net/manual/en/book.tidy.php

Mike Crowe
  • 2,203
  • 3
  • 22
  • 37
2

While I'm sure you could get DOM to do what you want I'm pretty sure you'd be better off with Tidy.

tplaner
  • 8,363
  • 3
  • 31
  • 47
  • Wish I could, can't install Tidy on my server. – Jeff Nov 13 '09 at 20:06
  • 1
    While not as comprehensive, there are some functions out there which will go through and close any open tags, two examples are http://snipplr.com/view/3618/close-tags-in-a-htmlsnippet/ and http://codesnippets.joyent.com/posts/show/959. – tplaner Nov 13 '09 at 20:28
  • 1
    I'm fairly sure there is are some classes on www.phpclasses.org which will also close open tags, you might also look at ones which were built for XML as they will basically work the same way. – tplaner Nov 13 '09 at 20:30
0

I think you're following the wrong approach: You have to use the DOM stuff to truncate the string, not after truncating it.

This is how I would do it:

  1. Find the place where you want to truncate the string
  2. Delete all child nodes after that point
  3. Truncate the string
Franz
  • 11,353
  • 8
  • 48
  • 70