I've already looked into this question: Why do browsers still inject <tbody> in HTML5?, which explains why the domParsing is adding <tbody> tags to the dom if the parsed table doesn't already have them.
I don't have a problem with the <tbody>
tag getting added but I do see an issue with IE 11 where two </tbody></tbody> end tags are getting added to the output, even though there is only one <tbody> start tag. This ends up breaking my application because the resulting xml is no longer valid XHTML.
html = '<html><head><title>Serializer differences</title></head><body> <table> <tr> <td> <h2>SOFTWARE </h2> </td> <td> Some Text </td> </tr></table></body></html>';
domParser = new DOMParser();
xmlSerializer = new XMLSerializer();
doc = domParser.parseFromString(html, 'text/html');
console.log(xmlSerializer.serializeToString(doc));
You can play with the fiddle here: http://jsfiddle.net/bskinnersf/aSUX7/10/
On IE11, the output is:
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Serializer differences</title></head><body> <table> <tbody><tr> <td> <h2>SOFTWARE </h2> </td> <td> Some Text </td> </tr>
</tbody></tbody>
</table></body></html>
Chrome, Firefox, Canary, only output the single </tbody>
tag as expected.
The input html data that I'm using is not under my control and unfortunately created using MS Word. I've tried using parseFromString(html, 'application/xhtml+xml') but it has numerous issues with MS Word produced html (suprise!).
Is there anything else I can do in my javascript parsing to prevent this double tbody end tag?