6

What is best practice to repair malformed XML files with PHP? For example CDATA part contains illegal chars. With regular expressions? Or execute some Linux command line tools?

Ain
  • 63
  • 1
  • 3
  • This question is pretty *vague*, maybe you should specify in depth the kind of malformed documents you have to deal with. XML is **extremely** general in scope, so a general solution is not really feasible. – ZJR Sep 26 '10 at 08:52
  • XML parsers are pretty strict, some preprocessing sometimes may easen that, but to get an answer you need to provide more details. A far fetched guess: with **XML** do you really mean, maybe, **XHTML**? – ZJR Sep 26 '10 at 08:55
  • "with regular expressions?" Certainly not. See http://stackoverflow.com/questions/701166/can-you-provide-some-examples-of-why-it-is-hard-to-parse-xml-and-html-with-a-rege –  Sep 26 '10 at 09:08
  • thanks, specifically, there are problem with illegal chars (unescaped entities) i.e Me myself & I sometimes also directly fetched and unescaped HTML code Some important content here – Ain Sep 26 '10 at 10:40

1 Answers1

8

Tidy

Tidy is a binding for the Tidy HTML clean and repair utility which allows you to not only clean and otherwise manipulate HTML documents, but also traverse the document tree.

// Specify configuration
$config = array(
           'indent'     => true,
           'input-xml'  => true,
           'output-xml' => true,
           'wrap'       => false);
// Tidy
$tidy = new tidy;
$tidy->parseFile('sample.xml', $config);
$tidy->cleanRepair();
// Output
echo $tidy;
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
  • This is perfect. But I need to save the repaired string into a file. I tried `file_put_contents("new.xml",$tidy)` but new.xml file is created with no contents. – vidhya Jan 02 '15 at 08:39