I am fetching HTML from a smarty template and need to clean it (simply want to remove extra whitespace, and format / indent the HTML nicely), I'm using tidy to do something like:
$html = $smarty->fetch('foo.tmpl');
$tidy = new tidy;
$tidy->parseString($html, array(
'hide-comments' => TRUE,
'output-xhtml' => TRUE,
'indent' => TRUE,
'wrap' => 0
));
$tidy->cleanRepair();
return $tidy;
While this works ok for english, multilingual support seems to break this. For example, I have arabic characters ok in $html, but after tidy I get back some nasty encoding:
هل أنت متأكد أنك تريد
Is there a setting in tidy that will format the HTML, but leave the HTML itself alone? I looked at this post: PHP "pretty print" HTML (not Tidy) but it's seems like this won't work since I'm grabbing my HTML from smarty.
Any suggestions appreciated.