I can't get xml.etree.ElementTree to print or acknowledge the correct XHTML header. It insists on giving a generic XML header, prefixing all tags with "html:", throwing exceptions, or a combination of those.
How do I create a valid XHTML document in the first place?
I've got about 4 megabytes of xml files, and I'm trying to create a valid epub from them. There's various munging that needs to be done, <chapter>
tags have no place in xhtml, for instance.
the following code:
import xml.etree.ElementTree as ET
xhtml = ET.fromstring(
"<?xml version=\"1.0\" xmlns=\"http://www.w3.org/1999/xhtml\" ?>\n<head><title></title></head>\n<body>\n</body>")
throws:
xml.etree.ElementTree.ParseError: XML declaration not well-formed: line 1, column 31
If I instead give the "correct" xhtml header, it insists it's html, gives it's own xml header, and prefixes all tags with "html:"
If I give the "correct" xml header, then epubcheck complains about "" not being a valid namespace (which I suppose it isn't).
The theory is that if I could create (and subsequently write out) a valid xhtml document, I could parse my xml for the <body>
and <title>
that's needed, mung them appropriately (href and src's all need changed, for instance), stick them in there, and be golden.
According to what I've found, a valid xhtml document MUST start with <xhtml xmlns="http://www.w3.org/1999/xhtml>
and contain a head (with required title element) and a body. I'm not certain what (if any) of that I can leave out and still pass epubcheck's requirements.
Surely there's a way to force ET to use the correct header? Or do I need to use a different library, or what?