I just found the Mammoth Python package a couple of days ago and its a great tool which really creates clean HTML code from a Word doc. Its nearly perfect. There is just one artifact I don’t understand. The heading elements (h1-h6) it creates from the Word headings contain several <a>
elements with strange TOC ids. Looks like this:
<h1><a id="_Toc48228035"></a><a id="_Toc48288791"></a><a id="_Toc48303673"></a><a id="_Toc48306159"></a><a id="_Toc48308644"></a><a id="_Toc48311128"></a><a id="_Toc48313611"></a>Arteriosklerose</h1>
Does anybody know how the get rid of these?
Thanks in advance
Cheers, Peter