Why does IE inject an extra end tag into my table when parsing and serializing HTML?

Question

I've already looked into this question: Why do browsers still inject <tbody> in HTML5?, which explains why the domParsing is adding <tbody> tags to the dom if the parsed table doesn't already have them.

I don't have a problem with the <tbody> tag getting added but I do see an issue with IE 11 where two </tbody></tbody> end tags are getting added to the output, even though there is only one <tbody> start tag. This ends up breaking my application because the resulting xml is no longer valid XHTML.

html = '<html><head><title>Serializer differences</title></head><body> <table> <tr> <td>  <h2>SOFTWARE </h2>  </td>  <td>  Some Text  </td> </tr></table></body></html>';

domParser = new DOMParser();
xmlSerializer = new XMLSerializer();

doc = domParser.parseFromString(html, 'text/html');
console.log(xmlSerializer.serializeToString(doc));

You can play with the fiddle here: http://jsfiddle.net/bskinnersf/aSUX7/10/

On IE11, the output is: <html xmlns="http://www.w3.org/1999/xhtml"><head><title>Serializer differences</title></head><body> <table> <tbody><tr> <td> <h2>SOFTWARE </h2> </td> <td> Some Text </td> </tr> </tbody></tbody> </table></body></html>

Chrome, Firefox, Canary, only output the single </tbody> tag as expected.

The input html data that I'm using is not under my control and unfortunately created using MS Word. I've tried using parseFromString(html, 'application/xhtml+xml') but it has numerous issues with MS Word produced html (suprise!).

Is there anything else I can do in my javascript parsing to prevent this double tbody end tag?

Do you have the same issue if you just use tbody with both tbody and /tbody tags? That issue with IE 11 looks like a bug, but you shouldnt worry about a bug on a browser that's trying to actually fix your code, just write good code! — user3417400, Apr 28 '14 at 18:02
The issue goes away under those conditions, the problem is that I don't have control over the input files where you would need to do those table edits. The user is uploading a terrible MS Word produced html file and I'm trying to convert it to good XHTML using the parser and serializer classes. — bskinnersf, Apr 28 '14 at 20:05

Why does IE inject an extra end tag into my table when parsing and serializing HTML?

0 Answers0