Following is a solution using the xml.etree.ElementTree
module and the example XML you provided in your question.
See corresponding notes in code sample following.
Create root element from the source XML text.
The xml.etree.ElementTree.fromstring()
function parses the provided XML string and returns an Element
instance.
Use XPath query to locate the new root Element.
The findall()
function returns a list of matching Element objects from the source Element
object.
Since you are trying to establish a new root for your new XML document, this query must be designed to match one and only one element from the source document, hence the extraction of new_root
via [0]
. (Insert appropriate error handling here!)
The ElementTree
module has limited XPath support, but here is a breakdown of the query string:
.//c
: Search for all <c>
elements
[@key='i want this']
: Filter found <c>
elements and return only those with a key
attribute matching 'i want this'
- Encode new root Element to a Unicode string.
The xml.etree.ElementTree.tostring()
function renders the provided Element
and its children to XML text. The encoding="unicode"
is specified since the default encoding returns a byte string.
Code sample:
import xml.etree.ElementTree as ET
if __name__ == "__main__":
# 0. Assign test XML text string.
my_xml = '''<a>
<b1>not interested</b1>
<b2 key="not interested at all">
<c key="i want this">
<d1> the good stuff</d1>
<d2> more good stuff </d2>
<d3>
<e1 key="good">still good stuff</e1>
</d3>
</c>
</b2>
</a>'''
# 1. Create root Element from the source XML text.
root = ET.fromstring(my_xml)
# 2. Use XPath query to locate the new root Element.
new_root = root.findall(".//c[@key='i want this']")[0]
# 3. Encode new root Element to a Unicode string.
my_new_xml = ET.tostring(new_root, encoding="unicode")
print(my_new_xml)