(although it begins with a document which has multiple <?xml ..>
statements, the question is not answered by merely stating that this is 'not well-formed xml'. Please read further!)
Still working on the same project, outlined in my previous question XSLT: choose template, variable length dt_assoc inside elem, building transform for DNS records format, Thanks to the good advice from @Tim C there, I am on to the next phase. This has to do with parsing a text file which is made up of a series of xml "documents"... that is, the file is structured like:
<?xml version='1.0' encoding='UTF-8'?>
<ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204245" accountId="414660" name="addressing.com" ttl="300" emailAddress="ipadmin@stabletransit.com" updated="2012-10-10T21:33:36Z" created="2009-07-25T15:05:39Z">
<ns2:nameservers>
<ns2:nameserver name="dns1.stabletransit.com" />
<ns2:nameserver name="dns2.stabletransit.com" />
</ns2:nameservers>
<ns2:recordsList totalEntries="5">
<ns2:record id="A-2542579" type="A" name="addressing.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:02:16Z" />
<ns2:record id="NS-3093587" type="NS" name="addressing.com" data="dns1.stabletransit.com" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:03:16Z" />
<ns2:record id="NS-3093589" type="NS" name="addressing.com" data="dns2.stabletransit.com" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:03:16Z" />
<ns2:record id="CNAME-6051671" type="CNAME" name="vh1.addressing.com" data="vh1.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
<ns2:record id="CNAME-6051873" type="CNAME" name="www.addressing.com" data="virtual.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
</ns2:recordsList>
</ns2:domain>
<?xml version='1.0' encoding='UTF-8'?>
<ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204245" accountId="414660" name="addressing.com" ttl="300" emailAddress="ipadmin@stabletransit.com" updated="2012-10-10T21:33:36Z" created="2009-07-25T15:05:39Z">
<ns2:nameservers>
<ns2:nameserver name="dns1.stabletransit.com" />
<ns2:nameserver name="dns2.stabletransit.com" />
</ns2:nameservers>
<ns2:recordsList totalEntries="5">
<ns2:record id="A-2542579" type="A" name="addressing.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:02:16Z" />
<ns2:record id="NS-3093587" type="NS" name="addressing.com" data="dns1.stabletransit.com" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:03:16Z" />
<ns2:record id="NS-3093589" type="NS" name="addressing.com" data="dns2.stabletransit.com" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:03:16Z" />
<ns2:record id="CNAME-6051671" type="CNAME" name="vh1.addressing.com" data="vh1.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
<ns2:record id="CNAME-6051873" type="CNAME" name="www.addressing.com" data="virtual.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
</ns2:recordsList>
</ns2:domain>
... etc ...
and I'm trying to figure out the best way to manage processing these individual chunks, which must now be individually passed to my XSLT transform and then passed via an API POST to the remote server for processing (into new DNS zone records)...
I'm a bit stuck, having experimented with ElementTree, thinking perhaps if I added a new 'root' to the whole thing, I could make a single tree out of it, and process for each of the ns2:domain elements.
so, I tried modifying the source something like this, after deleting all but the initial <?xml..>
statement:
<?xml version='1.0' encoding='UTF-8'?>
<rackspace>
<ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" ... >
...
</ns2:domain>
<ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" ... >
...
</ns2:domain>
<ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" ... >
...
</ns2:domain>
</rackspace>
However, I'm completely unfamiliar with ElementTree and can't seem to get any sort of handle on the "ns2:domain" subtrees, which I'd want to pull as a whole into a variable to hand off to the xslt transform.
#!/usr/bin/python2.7
import fileinput
import string
import re
import hashlib
from xml.etree import ElementTree as ET
from xml.etree.ElementTree import Element, SubElement, tostring
ns= {'ns2':'http://docs.rackspacecloud.com/dns/api/v1.0'}
my_outfile='/Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/RackspaceDomains.out.txt'
my_infile='//Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/XSL_Rackspace_to_OpenSRS/saxon.test.xml'
'''FILE=open(my_infile,"r")
OUTFILE=open(my_outfile,"w")'''
print ("**** Start Reading from Input File ****")
with open(my_infile, 'rt') as f:
tree = ET.parse(f)
root=tree.getroot()
# ET.dump(root)
domain=SubElement(root,"ns2:domain",ns)
#ET.dump(domain)
recordsList=SubElement(root,"ns2:recordsList",ns)
#parent_map = dict((c, p) for p in tree.getiterator() for c in p)
#print parent_map
for node in recordsList:
for node in node:
print node.tag, node.text
for node in node:
print node.tag, node.text
I have no doubt there are simple and straightforward steps to getting this in place, but I just don't know the grammar!
So, pseudo-code something like this perhaps
open my_rackspace_file.xml as rackfile
print "Start"
for each ns2:domain in rackfile:
print "Processing ", ns2:domain/@name
my_domain=getsubtree(ns2:domain)
my_new_xml=`java saxon9he.jar net.sf.saxon.Transform -it < $my_domain` #Don't really know how this will work at the moment
API_POST (my_new_xml)
print "Done"
Many thanks for thoughts and suggestions on this! It's great to dive in the deep end, and know that it will all make sense eventually!
PF
BTW, I'm using Saxon XSLT 2.0 because I needed the regex features...