4

Is there some way to differentiate XML from HTML with PHP DomDocument?

I looked in the docs and didn't find anything.

I'm looking for a function like check($string) that returns 'is XML' or 'is HTML' for each $string.

These similar questions here in SO didn't help me.

Community
  • 1
  • 1
James
  • 1,653
  • 2
  • 31
  • 60

2 Answers2

2

There is no such function, but you can rest assured that some $string is well-formed XML when DOMDocument::loadXML() returned true (set recover to false). A HTML document fails with that.

For HTML you can use DOMDocument::loadHTML() to check if a document can be loaded as HTML. HTML is not as strict as XML.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • Thx @hakre. It looks right but the code `$dom = new DOMDocument(); $var = $dom->loadXML("Test"); print_r ($var);die();` returns 1. whats is wrong? – James Aug 07 '15 at 19:16
  • It should return ``bool(true)`, see here: https://eval.in/413856 - And that's fine as the string *is* well-formed XML. – hakre Aug 07 '15 at 19:21
  • Actually, you're right. I did not notice that the string is an well-formed XML. I made a test with other HTML and works like a charm returning `bool(false)` – James Aug 07 '15 at 19:30
  • 1
    It can be that a HTML document is well-formed XML. In that case you perhaps want to also check if the `->documentElement` field's `DOMElement::$tagName` is "`html`". Compare case-insensitive. It would be a strong signal that this is a HTML document. – hakre Aug 07 '15 at 20:46
0

Use preg_match extension. Example:

if( preg_match('/<html[^>]*>/', $string) ) {
{
  // ... actions for XML ...
} elseif( preg_match('/<\?xml[^?]*\?>/', $string) ) {
  // ... actions for HTML ...
} else {
  // ... actions for another ...
}
Quazer
  • 353
  • 1
  • 8