7

I have a big problem with the PHP DOMDocument::validate() who seems to ask the DTD systematically.

It is a big problem when I whant to validate, for example, an XHTML document as explained here.

As w3.org seems to reject all request from a PHP server, it's impossible to validate my document with this method...

Is there any solution for that ?

Thanks by advance

[EDIT] Here is some precisions :

/var/www/test.php :

<?php
$implementation = new DOMImplementation();

$dtd = $implementation->createDocumentType
       (
         'html',                                     // qualifiedName
         '-//W3C//DTD XHTML 1.0 Transitional//EN',   // publicId
         'http://www.w3.org/TR/xhtml1/DTD/xhtml1-'
           .'transitional.dtd'                       // systemId
       );

$document = $implementation->createDocument('', '', $dtd);

$document->validate();

[http://]127.0.0.1/test.php :

Warning: DOMDocument::validate(http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd): failed to open stream: HTTP request failed! HTTP/1.0 403 Forbidden
 in /var/www/test.php on line 14

Warning: DOMDocument::validate(): I/O warning : failed to load external entity "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" in /var/www/test.php on line 14

Warning: DOMDocument::validate(): Could not load the external subset "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" in /var/www/test.php on line 14

Related question :

Community
  • 1
  • 1
Pascal Qyy
  • 4,442
  • 4
  • 31
  • 46
  • Not sure what your issue is. `DOMDocument::validate` validates the document based on the loaded document's DTD. – Gordon Oct 31 '10 at 10:55
  • For exemple, if I provide this DTD : http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd, when I call DOMDocument::validate(), PHP send a request to get the file, but w3.org reply systematically with a 403 Forbidden or a 503 Service Unavailable, and PHP send me the warning : failed to load external entity "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" – Pascal Qyy Oct 31 '10 at 11:06
  • 4
    I see, yes. There is a bug open for it: http://bugs.php.net/bug.php?id=48080 – Gordon Oct 31 '10 at 11:17
  • I know, and it's very old, but is there any solution ? (and the difference with this bug report is that w3.org deliberately stop to permit request from PHP because of its comportment...) – Pascal Qyy Oct 31 '10 at 11:25
  • I guess the only way around this is to have a DTD at the server and then change the systemId to validate against that instead. – Gordon Oct 31 '10 at 11:28
  • At this point, I prefer change the etc/host of my server... But the problem is the same : no caching, and request each time I validate... – Pascal Qyy Oct 31 '10 at 11:36
  • 1
    I've added a comment providing your link to the w3 blog post to the bug report. Maybe that will get it some more attention. Good question btw. – Gordon Oct 31 '10 at 11:51
  • Thank you for your help. So it does not seem to have a solution yet... – Pascal Qyy Oct 31 '10 at 11:55
  • I did some more digging. See provided answer below. You can use a stream context. – Gordon Oct 31 '10 at 12:22

1 Answers1

8

Like pointed out in the comments, there is a Bug/FeatureRequest for DOMDocument::validate to accept the DTD as a string:

You could host the DTD yourself and change the systemId accordingly or you can provide a custom stream context to any loading done through libxml. For instance, providing a UserAgent will get around the W3C's blocking. You could also add proxy this way. See

Example:

$di = new DOMImplementation;
$dom = $di->createDocument(
    'html',
    'html',
    $di->createDocumentType(
        'html',
        '-//W3C//DTD XHTML 1.0 Transitional//EN',
        'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd'
    )
);
$opts = array(
    'http' => array(
        'user_agent' => 'PHP libxml agent',
    )
);
$context = stream_context_create($opts);
libxml_set_streams_context($context);

var_dump($dom->validate());

This would output

Warning: DOMDocument::validate(): Element html content does not follow the DTD, expecting (head , body), got  

Warning: DOMDocument::validate(): Element html namespace name for default namespace does not match the DTD 

Warning: DOMDocument::validate(): Value for attribute xmlns of html is different from default "http://www.w3.org/1999/xhtml" 

Warning: DOMDocument::validate(): Value for attribute xmlns of html must be "http://www.w3.org/1999/xhtml" 

bool(false)
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • 1
    Very interesting solution! It doesn't solve the problem of systematic request without caching (not very fair for w3 but it's not necessary to validate a document each time it is served), but I can now validate my documents. Thank you ^^ – Pascal Qyy Oct 31 '10 at 12:24
  • 1
    @GQyy actually, thanks for asking the question. It made me learn something new about DOM today, too ;) – Gordon Oct 31 '10 at 12:37