0

file 1:

<xmlsource>
    <sections>
        <section>
            <name>section 1</name>
            <path>path to section 1</path>
        </section>  
    </sections>
    <items>
        <item>
            <name>item 1</name>
            <path>path to item 1</path>
        </item>
        <item>
            <name>item 2</name>
            <path>path to item 2</path>
        </item>     
    </items>
    <forms>
        <form>
            <name>form 1</name>
            <path>path to form 1</path>
        </form> 
    </forms>
</xmlsource>

file 2:

<item>
    <name>item 3</name>
    <path>path to item 3</path>
</item>
<item>
    <name>item 4</name>
    <path>path to item 4</path>
</item>     

How to merge/append file 2 into file 1 as follow (using Python):

<xmlsource>
    <sections>
        <section>
            <name>section 1</name>
            <path>path to section 1</path>
        </section>  
    </sections>
    <items>
        <item>
            <name>item 1</name>
            <path>path to item 1</path>
        </item>
        <item>
            <name>item 2</name>
            <path>path to item 2</path>
        </item> 
        <item>
            <name>item 3</name>
            <path>path to item 3</path>
        </item>
        <item>
            <name>item 4</name>
            <path>path to item 4</path>
        </item>     
    </items>
    <forms>
        <form>
            <name>form 1</name>
            <path>path to form 1</path>
        </form> 
    </forms>
</xmlsource>

a. The order of item 1 - item 4 are not important as long as they are in the same group

<items>
......
</items>

b. After merging/appending, the tabs in new file must match/be the same.

Thanks a bunch.

            ____________________________

I've saved file1 as sample1.xml, file2 as sample2.xml, and the below python code as combinexml.py, and saved all of them in C:\Users\BB\Desktop\CombineXML\ then run them using IDLE. Here what I got, please help

Traceback (most recent call last):
  File "C:\Users\BB\Desktop\CombineXML\combinexml.py", line 46, in <module>
    r = XMLCombiner(('sample1.xml', 'sample2.xml')).combine()
  File "C:\Users\BB\Desktop\CombineXML\combinexml.py", line 7, in __init__
    self.roots = [et.parse(f).getroot() for f in filenames]
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1182, in parse
    tree.parse(source, parser)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 656, in parse
    parser.feed(data)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1642, in feed            self._raiseerror(v)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1506, in _raiseerror
    raise err
ParseError: junk after document element: line 5, column 8

If this code works, what is the name and location of combined xml? Or either file1 or file2 will have the merged files and no new xml file created? Thank you.

  • You try something? Which Parser you used lxml ? Need wrapper also in second file. – Vivek Sable Jul 13 '15 at 02:32
  • I'm a bit confused by your example data. Is `item 1` the same thing in both files, or are you renumbering the items to get `item 3`? The error you're showing has to do with trying to parse the second file as a single XML document, which it is not, since there are multiple top level elements (rather than a root element that contains all the rest). – Blckknght Jul 14 '15 at 01:39
  • Just corrected item number in file2. I want to merge file2 into file1, do not know how and tried the code and got error. – user3833361 Jul 14 '15 at 01:55

1 Answers1

0

From the answer:

from xml.etree import ElementTree as et

class XMLCombiner(object):
    def __init__(self, filenames):
        assert len(filenames) > 0, 'No filenames!'
        # save all the roots, in order, to be processed later
        self.roots = [et.parse(f).getroot() for f in filenames]

    def combine(self):
        for r in self.roots[1:]:
            # combine each element with the first one, and update that
            self.combine_element(self.roots[0], r)
        # return the string representation
        return et.tostring(self.roots[0])

    def combine_element(self, one, other):
        """
        This function recursively updates either the text or the children
        of an element if another element is found in `one`, or adds it
        from `other` if not found.
        """
        # Create a mapping from tag name to element, as that's what we are fltering with
        mapping = {el.tag: el for el in one}
        for el in other:
            if len(el) == 0:
                # Not nested
                try:
                    # Update the text
                    mapping[el.tag].text = el.text
                except KeyError:
                    # An element with this name is not in the mapping
                    mapping[el.tag] = el
                    # Add it
                    one.append(el)
            else:
                try:
                    # Recursively process the element, and update it in the same way
                    self.combine_element(mapping[el.tag], el)
                except KeyError:
                    # Not in the mapping
                    mapping[el.tag] = el
                    # Just add it
                    one.append(el)

if __name__ == '__main__':
    r = XMLCombiner(('sample1.xml', 'sample2.xml')).combine()
    print '-'*20
    print r
Community
  • 1
  • 1
Parth
  • 729
  • 8
  • 23