Consider the following setup of HTML Purifier:
require_once 'library/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$config->set('Core.EscapeInvalidTags', true);
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
If you run the following case:
$dirty_html = "<p>lorem <script>ipsum</script></p>";
//output
<p>lorem <script>ipsum</script></p>
As expected, instead of removing the invalid tags, it just escaped them all.
However, consider these other test cases:
case 1
$dirty_html = "<p>lorem <b>ipsum</p>";
//output
<p>lorem <b>ipsum</b></p>
//desired output
<p>lorem <b>ipsum</p>
case 2
$dirty_html = "<p>lorem ipsum</b></p>";
//output
<p>lorem ipsum</p>
//desired output
<p>lorem ipsum</b></p>
case 3
$dirty_html = "<p>lorem ipsum<script></script></p>";
//output
<p>lorem ipsum<script /></p>
//desired output
<p>lorem ipsum<script></script></p>
Instead of just escaping the invalid tags, first it repairs them and then escapes them. This way things can get very strange, for example:
case 4
$dirty_html = "<p><a href='...'><div>Text</div></a></p>";
//output
<p><a href="..."></a></p><div><a href="...">Text</a></div><a href="..."></a></p>
Question
Therefore, is it possible to disable the syntax repair and just escape the invalid tags?
Consider the inequations zw
`. However, when passing this into HTML Purifier things get messy: `Consider the inequations z<x and="" y="">w</x>
` – Mark Messa Jan 22 '18 at 15:03solve the inequations z<x and y>w
` and HTML Purifier would mantain that. – Mark Messa Jan 22 '18 at 15:06