1

I'm writing code that creates configuration files for our program. The configuration files are in an XML format.

I'm using Python's xml.dom.minidom module to parse the XML file. I want to be able to break up the configuration file into 2 or more smaller XML files.

Say if I have a file called main_config.xml:

<configuration>
   <server name="my_servername" ip="10.10.10.10">
     <disk>
       <volume>/dev/sdc1</volume>
     </disk>
   </server>

   <server>
     <redirect loc="./child.xml"
   </server>

   <server name="server3" ip="10.10.10.13">
     <disk>
       <volume>/dev/sdf1</volume>
     </disk>
   </server>

And the <"redirect"> element has a "loc" attribute that points to a file named "child.xml"

And the file "child.xml" has this

<server name="server2" ip="10.10.10.12">
    <disk>
       <volume>/dev/sde1</volume>
    </disk>
</server>

{Note: these are small simple configuration files. The ones I work with are much, much much longer (5000 lines or so) and are hard to edit, hence, the proposal to break up the configuration file into several smaller, more modular ones so it's much easier to edit}

What I want to do, with the xml.dom.minidom, to

1) Read in the XML document in main_config.xml

2) Parse the XML document from main_config.xml

3) If I see a <"redirect"> element, go to the file that the "loc" attribute is point to

4) Read in the XML document from "child.xml"

5) Replace the <"server"> element with the child <"redirect"> element in the main_config.xml with the <"server"> element from child.xml so that the XML document looks like this below:

<configuration>
   <server name="my_servername" ip="10.10.10.10">
     <disk>
       <volume>/dev/sdc1</volume>
     </disk>
   </server>

   <server name="server2" ip="10.10.10.12">
       <disk>
          <volume>/dev/sde1</volume>
       </disk>
   </server>

   <server name="server3" ip="10.10.10.13">
     <disk>
       <volume>/dev/sdf1</volume>
     </disk>
   </server>

Using xml.dom.minidom, I already can do steps 1 thru 4. However, I am stuck in Step 5 because the <"server"> element in main_config.xml is considered a nodeType of type ELEMENT_NODE but the <"server"> element from child.xml is considered a DOCUMENT_TYPE_NODE. So therefore, I cannot use the node.replaceChild() call because xml.dom.minidom complains that you can't put an XML document as a child to <"configuration">.

There is one way I could go about doing this, which is to walk thru the XML tree in the XML doc from child.xml file, delete the <"server"> element from the XML doc from main_config.xml, then create a new <"server"> element in the main_config.xml with all the nodes/attributes from the child.xml file. But I'd rather not do that unless it's the last resort.

Is there any other way where I can replace an ELEMENT_NODE with a DOCUMENT_TYPE_NODE? Is there a way to change the nodeType of a node object so replaceChild() works? (The xml.dom.minidom says that the nodeType is read-only though).

SQA777
  • 352
  • 5
  • 15
  • Using minidom is making things much harder for yourself than need be. I'd *strongly* suggest ElementTree, or the 3rd-party `lxml.etree` replacement. (And not just for usability concerns; `xml.dom.minidom` is also deprecated for security reasons). – Charles Duffy Sep 28 '15 at 21:00
  • ...that said, "can't put an XML document as a child" is an error with an obvious fix: Get a handle on the root node, not the document itself. – Charles Duffy Sep 28 '15 at 21:01
  • Charles: re: Comment #1. I was looking in to ElementTree and my first version of the script used that instead of xml.dom.minidom. However, ElementTree does not have a method like xml.dom.minidom's node.parentNode. This allows me to get the parent node of them. That's why I went with xml.dom.minidom. – SQA777 Sep 28 '15 at 21:33
  • Comment #2: how would I get the handle of the root object from the document object? – SQA777 Sep 28 '15 at 21:34
  • Can't speak with certainty for ElementTree, but lxml.etree *certainly* has that API. – Charles Duffy Sep 28 '15 at 21:35
  • re: #2, it's been over a decade since I've tried to do anything with minidom (and I found it was a pain even then), so I'd need to dig to figure out the local implementation. That said, having those be two distinct objects is that both exist is part of the Document Object Model, so... well, you can certainly expect it to be possible. – Charles Duffy Sep 28 '15 at 21:36
  • ...in lxml.etree, it's just `Element.getparent()` -- or, if you have a document and want a root node, `ElementTree.getroot()`. – Charles Duffy Sep 28 '15 at 21:38
  • ...also, various workarounds for ElementTree exist, ugly as they are; see http://stackoverflow.com/questions/2170610/access-elementtree-node-parent-node – Charles Duffy Sep 28 '15 at 21:39

1 Answers1

0

Attention! Attention! Fellow Pythoners, please add XSLT to your everyday practices. I have answered these questions quite a bit now (even among other languages -R, VBA, PHP). Much like SQL, a special-purpose declarative language, XSLT is also a special-purpose declarative language used to re-structure, re-format, style, and transform XML documents in various end-use needs.

With that said, please consider the below XSLT solution. Python's lxml module maintains an XSLT processor. Specifically for you, XSLT has a document() function that allows content from external xml files to be pulled in for a transformed file. Additionally, boolean logic with xsl:choose, xsl:when, and xsl:choose for if/then/else can be used to handle the different server node redirect. Finally, a dynamic variable can be passed (specifically name of file located in the redirect/@loc attribute). Be sure child.xml is in the same directory as the main_config.xml.

XSLT Script (to be save as .xsl or embedded in .py and if so, using lxml's fromstring())

<?xml version="1.0" ?> 
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> 

<xsl:template match="configuration">
  <configuration>
     <xsl:for-each select="//server">             
       <xsl:choose>
         <xsl:when test="count(redirect)>0">
            <xsl:variable select="redirect/@loc" name="xmldoc"/>
            <xsl:copy-of select="document($xmldoc)/server" />
         </xsl:when>
         <xsl:otherwise>          
            <xsl:copy-of select="." />          
         </xsl:otherwise>
       </xsl:choose>        
     </xsl:for-each>       
  </configuration>
</xsl:template> 

</xsl:transform>

Python script

import os
import lxml.etree as ET

# GET CURRENT PATH
cd = os.path.dirname(os.path.abspath(__file__))

# LOAD ORIGINAL XML AND XSL FILES
dom = ET.parse(os.path.join(cd, 'main_config.xml'))
xslt = ET.parse(os.path.join(cd, 'XSLTsript.xsl'))

# TRANSFORM XML
transform = ET.XSLT(xslt)
newdom = transform(dom)

# OUTPUT FINAL XML
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)

xmlfile = open(os.path.join(cd, 'output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()

Output

<?xml version='1.0' encoding='UTF-8'?>
<configuration>
  <server name="my_servername" ip="10.10.10.10">
     <disk>
       <volume>/dev/sdc1</volume>
     </disk>
   </server>
  <server name="server2" ip="10.10.10.12">
    <disk>
       <volume>/dev/sde1</volume>
    </disk>
</server>
  <server name="server3" ip="10.10.10.13">
     <disk>
       <volume>/dev/sdf1</volume>
     </disk>
   </server>
</configuration>
Parfait
  • 104,375
  • 17
  • 94
  • 125