I'm using lxml
to generate an XML file such as the one below. The documentation and other questions (1, 2) here on Stackoverflow nudged me into the right direction. What I'm struggling with are namespace prefixes such as those in the markList
and mark
nodes.
<?xml version='1.0' encoding='UTF-8'?><!DOCTYPE paula SYSTEM "paula_mark.dtd">
<paula version="1.1">
<header paula_id="Layer_Annotation.0_0000.mark"/>
<markList xmlns:xlink="http://www.w3.org/1999/xlink" type="Annotation" xml:base="text.xml">
<!--foo-->
<mark id="span1" xlink:href="#sTok1"/>
<!--bar-->
<mark id="span2" xlink:href="#sTok2"/>
</markList>
</paula>
This is what I got so far. As you can see from the output below, I'm stuck at the markList
node, and have been banging my head at this for a while now. Any further nudges would be really appreciated.
from lxml import etree
class XMLNamespaces:
xlink = "http://www.w3.org/1999/xlink"
xml = "text.xml"
top = etree.Element("paula", {"version":"1.1"})
header = etree.SubElement(top, "header", {"paula_id": "annotation.mark"})
mark_list = etree.SubElement(top, "markList", {
etree.QName(XMLNamespaces.xlink, "xlink"): "http://www.w3.org/1999/xlink",
"type": "Annotation",
etree.QName(XMLNamespaces.xml, "xml"): "http://www.w3.org/1999/xlink",
})
body = etree.SubElement(top, "body")
body.text = "test body"
print(etree.tounicode(top, pretty_print=True))
Here is my current output:
<paula version="1.1">
<header paula_id="annotation.mark"/>
<markList xmlns:ns0="http://www.w3.org/1999/xlink" xmlns:ns1="text.xml" ns0:xlink="http://www.w3.org/1999/xlink" type="Annotation" ns1:xml="http://www.w3.org/1999/xlink"/>
<body>test body</body>
</paula>