I try to load an HTML page from a remote server into a PHP script, which should manipulate the HTML with the DOMDocument class. But I have seen, that the DOMDocument class removes some parts of the Javascript, which comes with the HTML page. There are some things like:
<script type="text/javascript">
//...
function printJSPage() {
var printwin=window.open('','haha','top=100,left=100,width=800,height=600');
printwin.document.writeln(' <table border="0" cellspacing="5" cellpadding="0" width="100%">');
printwin.document.writeln(' <tr>');
printwin.document.writeln(' <td align="left" valign="bottom">');
//...
printwin.document.writeln('</td>');
//...
}
</script>
But the DOMDocument changes i.e. the line
printwin.document.writeln('</td>');
to
printwin.document.writeln(' ');
and also a lot of others things (i.e. the last script tag is no longer there. As the result I get a complete destroyed page, which I cannot send further.
So I think, DOMDocument has problems with the HTML tags within the Javascript code and tries to correct the code, to produce a well-formed document. Can I prevent the Javascript parsing within DOMDocument?
The PHP code fragment is:
$stdin = file_get_contents('php://stdin');
$dom = new \DOMDocument();
@$dom->loadHTML($stdin);
return $dom->saveHTML(); // will produce wrong HTML
//return $stdin; // will produce correct HTML
I have stored both HTML versions and have compared both with Meld.
I also have tested
@$dom->loadXML($stdin);
return $dom->saveHTML();
but I don't get any things back from the object.