What is best practice to repair malformed XML files with PHP? For example CDATA part contains illegal chars. With regular expressions? Or execute some Linux command line tools?
Asked
Active
Viewed 5,002 times
6
-
This question is pretty *vague*, maybe you should specify in depth the kind of malformed documents you have to deal with. XML is **extremely** general in scope, so a general solution is not really feasible. – ZJR Sep 26 '10 at 08:52
-
XML parsers are pretty strict, some preprocessing sometimes may easen that, but to get an answer you need to provide more details. A far fetched guess: with **XML** do you really mean, maybe, **XHTML**? – ZJR Sep 26 '10 at 08:55
-
"with regular expressions?" Certainly not. See http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege – Sep 26 '10 at 09:08
-
thanks, specifically, there are problem with illegal chars (unescaped entities) i.e
sometimes also directly fetched and unescaped HTML codeMe myself & I – Ain Sep 26 '10 at 10:40Some important content here
1 Answers
8
Tidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree.
// Specify configuration
$config = array(
'indent' => true,
'input-xml' => true,
'output-xml' => true,
'wrap' => false);
// Tidy
$tidy = new tidy;
$tidy->parseFile('sample.xml', $config);
$tidy->cleanRepair();
// Output
echo $tidy;

Mads Hansen
- 63,927
- 12
- 112
- 147
-
This is perfect. But I need to save the repaired string into a file. I tried `file_put_contents("new.xml",$tidy)` but new.xml file is created with no contents. – vidhya Jan 02 '15 at 08:39