I'm not going to have a complete solution.
In my mind there are two or even three stages in such a conversion:
Stage 1: get the HTML5 well formed
There's sort of black art to this first phase where the lack of well-structured requirement of HTML 5 needs to be accommodated for.
You need this before you have a DOM, before you have any chance of getting tools that expect something that remotely looks like xml to function.
So who's implemented such conversion: (almost?) every browser. Quite a few have source code. You can get this information out of a running browser as well:
inspect the source code and see what it does with tag soup as input and you get well structured source code instead.
Another place to find such source code is in editors that allow you to edit xhtml in a webpage (FCKeditor and the like)
e.g. <p>para<ul><li>bullet</ul><p>para
gets changed into <p>para</p><ul><li>bullet</li></ul><p>para</p>
Stage 2: filter out what's not allowed in Polyglot
Once the html tags are well structured, comes the next step where you have to remove what's not allowed in polyglot markup because there are differences with how it'll be interpreted between an html parser and a XML parser.
Those you might have a chance with XSLT, and building a filter, but you cannot validate it all as there is no DTD or anything equivalent for validating polyglot (x)html against. Even those few validators for xhtml5 that existed are being (have been) scrapped, so it'll make your quest a difficult one.
Anyway, trying to locate source of one of those validators that existed is your best option at finding source code that comes near this.
Stage 3: fix the external entities
Say what ? Well you can have beautiful polyglot (x)html and include a single javascript that does a single document.write and it all still fails. So you'll need to hunt down all of that too before it works.
` (not `
` or `
`-only)... About "javascript", I not understand... I am [striping](http://php.net/manual/en/function.strip-tags.php) scripts (!). – Peter Krauss Sep 15 '15 at 17:36
syntax, but must use
in polyglot html. They are called "void html elements". See section 4.6.1 in the polyglot document over at w3.org you linked in the question already. While others must use the syntax even when empty. – Sep 16 '15 at 11:04