1

I'm trying to find paragraphs with the id "test" and remove them from a html string, I've tried using php DOMDocument but the html I'm searching is badly formed and I get errors

$caption = "blah blah<p id ='test'>Test message</p>";
$doc = new DOMDocument();
$doc->loadHTMLFile($caption);
$xmessage = $doc->getElementById('test');

returns

Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : br i

Is there a way to suppress the warnings? Thanks

Syscall
  • 19,327
  • 10
  • 37
  • 52
HiSpec
  • 125
  • 3
  • 12

4 Answers4

4

You can use following code to remove a para with id='test':

$caption = "blah blah<p id='test'>Test message</p><p id='foo'>Foo Bar</p>";
$doc = new DOMDocument();
$doc->loadHTML($caption);
$xpath = new DOMXPath($doc);
$nlist = $xpath->query("//p[@id='test']");
$node = $nlist->item(0);
echo "Para: [" . $node->nodeValue . "]\n";
$node->parentNode->removeChild($node);
echo "Remaining: [" . $doc->saveHTML() . "]\n";

OUTPUT:

Para: [Test message]
Remaining: [<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<p>blah blah</p>
<p id="foo">Foo Bar</p>
</body></html>
]
anubhava
  • 761,203
  • 64
  • 569
  • 643
3

Don't use loadHTMLFile() use loadHTML().

The latter expects HTML string, which is what you are providing. Doing so should correct the warning.

gen_Eric
  • 223,194
  • 41
  • 299
  • 337
Jason McCreary
  • 71,546
  • 23
  • 135
  • 174
1

getElementById requires the HTML to be validated before it'll work. See this StackOverflow answer for more info.

$caption = "blah blah<p id ='test'>Test message</p>";
$doc = new DOMDocument;
$doc->validateOnParse = true;  // validate HTML
$doc->loadHTML($caption);  // This loads an HTML string
$xmessage = $doc->getElementById('test');

(NOTE: You need to use loadHTML, not loadHTMLFile).

This still may not work, as the HTML may not be valid.

If this doesn't work, I suggest using DOMXPath.

$caption = "blah blah<p id ='test'>Test message</p>";
$doc = new DOMDocument;
$doc->loadHTMLFile($caption);
$xpath = new DOMXPath($doc);
$xmessage = $xpath->query("//p[@id='test']")->item(0);
Community
  • 1
  • 1
gen_Eric
  • 223,194
  • 41
  • 299
  • 337
-1

There's more than one paragraph with the same ID? Surely not...

It's generally bad practice (as the warnings are there for a reason), but you can suppress warnings using @, although i'm not 100% certain it works on function calls from a class like this, let me know if it does!

$caption = "blah blah<p id ='test'>Test message</p>";
$doc = new DOMDocument();
@$doc->loadHTMLFile($caption);
$xmessage = @$doc->getElementById('test');
Nick
  • 6,316
  • 2
  • 29
  • 47
  • 2
    Using the `@` works fine here, but the problem was that he was using the wrong method. `loadHTMLFile` expects a file name, he wanted `loadHTML` which takes a string of HTML. – gen_Eric Jan 10 '12 at 16:18
  • Thanks but get "Catchable fatal error: Object of class DOMElement could not be converted to string in" – HiSpec Jan 10 '12 at 16:28