0

(although it begins with a document which has multiple <?xml ..> statements, the question is not answered by merely stating that this is 'not well-formed xml'. Please read further!)

Still working on the same project, outlined in my previous question XSLT: choose template, variable length dt_assoc inside elem, building transform for DNS records format, Thanks to the good advice from @Tim C there, I am on to the next phase. This has to do with parsing a text file which is made up of a series of xml "documents"... that is, the file is structured like:

<?xml version='1.0' encoding='UTF-8'?>
<ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204245" accountId="414660" name="addressing.com" ttl="300" emailAddress="ipadmin@stabletransit.com" updated="2012-10-10T21:33:36Z" created="2009-07-25T15:05:39Z">
    <ns2:nameservers>
        <ns2:nameserver name="dns1.stabletransit.com" />
        <ns2:nameserver name="dns2.stabletransit.com" />
    </ns2:nameservers>
    <ns2:recordsList totalEntries="5">
        <ns2:record id="A-2542579" type="A" name="addressing.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:02:16Z" />
        <ns2:record id="NS-3093587" type="NS" name="addressing.com" data="dns1.stabletransit.com" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:03:16Z" />
        <ns2:record id="NS-3093589" type="NS" name="addressing.com" data="dns2.stabletransit.com" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:03:16Z" />
        <ns2:record id="CNAME-6051671" type="CNAME" name="vh1.addressing.com" data="vh1.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
        <ns2:record id="CNAME-6051873" type="CNAME" name="www.addressing.com" data="virtual.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
    </ns2:recordsList>
</ns2:domain>
<?xml version='1.0' encoding='UTF-8'?>
<ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204245" accountId="414660" name="addressing.com" ttl="300" emailAddress="ipadmin@stabletransit.com" updated="2012-10-10T21:33:36Z" created="2009-07-25T15:05:39Z">
    <ns2:nameservers>
        <ns2:nameserver name="dns1.stabletransit.com" />
        <ns2:nameserver name="dns2.stabletransit.com" />
    </ns2:nameservers>
    <ns2:recordsList totalEntries="5">
        <ns2:record id="A-2542579" type="A" name="addressing.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:02:16Z" />
        <ns2:record id="NS-3093587" type="NS" name="addressing.com" data="dns1.stabletransit.com" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:03:16Z" />
        <ns2:record id="NS-3093589" type="NS" name="addressing.com" data="dns2.stabletransit.com" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:03:16Z" />
        <ns2:record id="CNAME-6051671" type="CNAME" name="vh1.addressing.com" data="vh1.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
        <ns2:record id="CNAME-6051873" type="CNAME" name="www.addressing.com" data="virtual.eiotx.net" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:05:09Z" />
    </ns2:recordsList>
</ns2:domain>

... etc ...

and I'm trying to figure out the best way to manage processing these individual chunks, which must now be individually passed to my XSLT transform and then passed via an API POST to the remote server for processing (into new DNS zone records)...

I'm a bit stuck, having experimented with ElementTree, thinking perhaps if I added a new 'root' to the whole thing, I could make a single tree out of it, and process for each of the ns2:domain elements.

so, I tried modifying the source something like this, after deleting all but the initial <?xml..> statement:

<?xml version='1.0' encoding='UTF-8'?>
<rackspace>
    <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" ... >
    ...
    </ns2:domain>
    <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" ... >
    ...
    </ns2:domain>
    <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" ... >
    ...
    </ns2:domain>
</rackspace>

However, I'm completely unfamiliar with ElementTree and can't seem to get any sort of handle on the "ns2:domain" subtrees, which I'd want to pull as a whole into a variable to hand off to the xslt transform.

#!/usr/bin/python2.7

import fileinput
import string
import re
import hashlib

from xml.etree import ElementTree as ET
from xml.etree.ElementTree import Element, SubElement, tostring

ns= {'ns2':'http://docs.rackspacecloud.com/dns/api/v1.0'}

my_outfile='/Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/RackspaceDomains.out.txt'
my_infile='//Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/XSL_Rackspace_to_OpenSRS/saxon.test.xml'

'''FILE=open(my_infile,"r")
OUTFILE=open(my_outfile,"w")'''

print ("**** Start Reading from Input File ****")

with open(my_infile, 'rt') as f:

     tree = ET.parse(f)

root=tree.getroot()
# ET.dump(root)

domain=SubElement(root,"ns2:domain",ns)
#ET.dump(domain)
recordsList=SubElement(root,"ns2:recordsList",ns)

#parent_map = dict((c, p) for p in tree.getiterator() for c in p)
#print parent_map

for node in recordsList:
     for node in node:
          print node.tag, node.text
          for node in node:
               print node.tag, node.text

I have no doubt there are simple and straightforward steps to getting this in place, but I just don't know the grammar!

So, pseudo-code something like this perhaps

open my_rackspace_file.xml as rackfile
print "Start"
for each ns2:domain in rackfile:
   print "Processing ", ns2:domain/@name
   my_domain=getsubtree(ns2:domain)
   my_new_xml=`java saxon9he.jar net.sf.saxon.Transform -it < $my_domain` #Don't really know how this will work at the moment
   API_POST (my_new_xml)

print "Done"

Many thanks for thoughts and suggestions on this! It's great to dive in the deep end, and know that it will all make sense eventually!

PF

BTW, I'm using Saxon XSLT 2.0 because I needed the regex features...

Community
  • 1
  • 1
  • You cannot use an XML parser to solve this problem because your text is not XML. You must first remove the extra XML declaration, preferably by fixing the problem at the source, or by manually or programmatically repairing it at the ***text*** level, before you can use any conformant XML-based tools or libraries. – kjhughes Mar 17 '17 at 02:08
  • Note also that besides having multiple XML declarations, your textual objects also have multiple root elements. Again, this well-formedness issue must be resolved by (preferably) fixing the source or, if necessary, making repairs at the text, not XML, level. – kjhughes Mar 17 '17 at 02:15
  • Thanks for your input @kjhughes ... I don't think it's quite as straightforward as you have it. I'm already showing that I – pfraterdeus Mar 17 '17 at 03:09
  • @kjhughes sorry, do you mean multiple root elements, in that the DTD says that ns2:domain is a root element, and therefore even if I manually add an enclosing element ie `` that it will not parse as well-formed? Apologies, this is all pretty new to me! Thanks – pfraterdeus Mar 17 '17 at 03:19
  • An XML document may only have a single root element, yes. – kjhughes Mar 17 '17 at 03:57

0 Answers0