2

I've been looing around for a method to remove an element from an XML document,while keeping the contents, using Python, but i haven't been able to find an answer that works.

Basically, i received an XML document in the following format (example):

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
    </element1>
    <element1>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

What i have to do is to merge element2 and element3 into element1 such that the output XML document looks like:

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

I would appreciate some tips on my (hopefully) simple problem.

Note: I am somewhat new to Python as well, so bear with me.

Alex-C
  • 23
  • 3

1 Answers1

0

This might not be the prettiest of solutions, but since there's no other answer yet...

You could just search for, e.g., </element1><element1> and replace it with the empty string.

xml = """<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
    </element1>
    <element1>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>"""

import re
print re.sub(r"\s*</element1>\s*<element1>", "", xml)

Or more generally, re.sub(r"\s*</([a-zA-Z0-9_]+)>\s*<\1>", "", xml) to merge all consecutive instances of the same element, by matching the first element name as a group and then looking for that same group with \1.

Output, in both cases:

<root>
    <element1>
        <element2>
            <text> random text </text>
        </element2>
        <element3>
            <text> random text </text>
        </element3>
    </element1>
</root>

For more complex documents, you might want to use one of Python's many XML libraries instead.

tobias_k
  • 81,265
  • 12
  • 120
  • 179