Merging XML elements while keeping the contents using python

Question

I've been looing around for a method to remove an element from an XML document,while keeping the contents, using Python, but i haven't been able to find an answer that works.

Basically, i received an XML document in the following format (example):

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
    </element1>
    <element1>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

What i have to do is to merge element2 and element3 into element1 such that the output XML document looks like:

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

I would appreciate some tips on my (hopefully) simple problem.

Note: I am somewhat new to Python as well, so bear with me.

Do you want to _remove_ `element1`, or do you want to _merge_ them? — tobias_k, Aug 01 '14 at 08:17
http://stackoverflow.com/questions/15921642/merging-xml-files-using-pythons-elementtree — Avinash Babu, Aug 01 '14 at 08:22
First you need to find all between tags. Append to array all elements with those values. Then you can create your new XML file. — Fox_01, Aug 01 '14 at 08:32

score 0 · Answer 1 · answered Aug 01 '14 at 09:41

This might not be the prettiest of solutions, but since there's no other answer yet...

You could just search for, e.g., </element1><element1> and replace it with the empty string.

xml = """<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
    </element1>
    <element1>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>"""

import re
print re.sub(r"\s*</element1>\s*<element1>", "", xml)

Or more generally, re.sub(r"\s*</([a-zA-Z0-9_]+)>\s*<\1>", "", xml) to merge all consecutive instances of the same element, by matching the first element name as a group and then looking for that same group with \1.

Output, in both cases:

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

For more complex documents, you might want to use one of Python's many XML libraries instead.

This should work until i find a better way using an XML library. Thank you very much. — Alex-C, Aug 01 '14 at 11:07

Merging XML elements while keeping the contents using python

1 Answers1