1

I have done a lot of research on this topic, the issue I'm having differs depending on what method I'm using. The files used are XML files. what I'm trying to do is use a template file EX:

<?xml version= "1.0" encoding= "iso-8859-1"?>
<r:root xmlns:p="./file" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Version>1</Version>
 <Parent Number="">
 </Parent>
</r:root>

and insert a node from another file into the template under the parent tag. Insert file:

<?xml version= "1.0" encoding= "iso-8859-1"?>
<Child ID="" Type="">
 <Sub1>text</Sub1>
 <Sub2>text</Sub2>
 <Sub3>text</Sub3>
 <Sub4>text</Sub4>
 <Nest1>
  <Sub1>text</Sub1>
  <Sub2>text</Sub2>
 </Nest1>
</Child>

I'm currently trying to use the deepycopy method where I'm parsing the files and deepcopying the root.

lxml method issues: when I insert the node into the parent tree and try to print out the new tree this is the output.

<?xml version= "1.0" encoding= "iso-8859-1"?>
<r:root xmlns:p="./file" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <Version>1</Version>
  <Parent Number="">
 <Child ID="" Type=""><Sub1>text</Sub1><Sub2>text</Sub2><Sub3>text</Sub3><Sub4>text</Sub4><Nest1><Sub1>text</Sub1><Sub2>text</Sub2></Nest1></Child></Parent>
</r:root>

elementtree method issues: I couldn't get the pretty print to look right use the minidom prettify, and would turn the r:root into ns0:root.

    import xml.etree.ElementTree as ET
    import xml.dom.minidom as MD
    def prettify(root, encoder):
        rough_string = ET.tostring(root, str(encoder))
        reparse = MD.parseString(rough_string)
        return reparse.topprettyxml(indent=" ", newl="")

beautifulsoup method issue: I got it to work when it was parsing with HTML but was lowercasing everything and I can't have that, wasn't able to get the xml parser to work.

all I need is for when I insert the node it keeps the pretty structure.

What am I doing wrong or missing here to make this work?

  • 1
    [Here's an answer](https://stackoverflow.com/a/68422378/2834978) on a similar issue. You don need minidom to pretty print – LMC Jul 19 '21 at 22:03
  • [This answer](https://stackoverflow.com/a/4590052/2834978) also worked for me. – LMC Jul 19 '21 at 23:09

1 Answers1

0

Since your question is tagged lxml, let's use that. But first note that in your template file you have a typo: xmlns:p="./file" should probably be xmlns:r="./file" (since your first element is r:root). Assuming that's fixed, you can:

from lxml import etree
temp = """<?xml version= "1.0" encoding= "iso-8859-1"?>
<r:root xmlns:r="./file" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Version>1</Version>
 <Parent Number="">
 </Parent>
</r:root>"""
#I modified the source file a bit to differentiate the various element levels
source = """<?xml version= "1.0" encoding= "iso-8859-1"?>
<Child ID="" Type="">
 <Sub1>text1</Sub1>
 <Sub2>text2</Sub2>
 <Sub3>text3</Sub3>
 <Sub4>text4</Sub4>
 <Nest1>
  <NSub1>textN1</NSub1>
  <NSub2>textN2</NSub2>
 </Nest1>
</Child>"""
    
temp_doc = etree.XML(temp.encode())
source_doc = etree.XML(source.encode())
    
#get the elements to be inserted in the template
ins = source_doc.xpath('//Child/*')
#locate the place in the template where these elements are to be inserted
destination = temp_doc.xpath('//Parent')[0]
#now insert them
for i in reversed(ins):
    destination.insert(0,i)
print(etree.tostring(temp_doc).decode())

Output:

<r:root xmlns:r="./file" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<Version>1</Version>
 <Parent Number="">
 <Sub1>text1</Sub1>
 <Sub2>text2</Sub2>
 <Sub3>text3</Sub3>
 <Sub4>text4</Sub4>
 <Nest1>
  <NSub1>textN1</NSub1>
  <NSub2>textN2</NSub2>
 </Nest1>
</Parent>
</r:root>
Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • is there a reason you put it into string instead of using the parse function to read the file? – Samuel Smith Jul 20 '21 at 14:44
  • this was my approach in lxml: index = 0 for key in node_dict: i = 1 while i <= node_dict[key]: insert_node = XML.grab_node(key) print("___________\n___________") parent_node.insert(index, insert_node) pretty2 = LET.tostring(parent_root, pretty_print=True).decode() print(pretty2) print("___________\n___________") i += 1 index += 1 – Samuel Smith Jul 20 '21 at 14:59
  • @staticmethod def grab_node(node_tag): _node_file = node_tag.lower()+' node' _node_path = fr"../{_node_file}.xml" _parser = LET.parse(_node_path, _parser) _tree = LET.parse(_node_path, _parser) _root = _tree.getroot() _pretty = LET.tostring(_root, pretty_print=True).decode() print(_pretty) return copy.deepcopy(_root) – Samuel Smith Jul 20 '21 at 15:01
  • I was trying to use deepcopy in order to copy the node of child but when I insert it into the parent node the child node is inserted without the \n – Samuel Smith Jul 20 '21 at 15:02
  • @SamuelSmith I put it into strings because I don't have the files; just the snippets in your question. – Jack Fleeting Jul 20 '21 at 16:37
  • @SamuelSmith Not sure what you mean. The output is what you have in your question. – Jack Fleeting Jul 20 '21 at 20:56
  • I know the code I have propers a valid XML file but when I try to pretty print it doesn't look 100% right. which I could just show you. your code works just not sure how to introduce it into my code. I probably did a poor job explaining it in my question. – Samuel Smith Jul 20 '21 at 20:59