Specific pathing to find XML elements using minidom in Python

Question

As per this thread, I am using xml.dom.minidom to do some very basic XML traversing, read-only.

What confuses me is why its getElementsByTagName is finding nodes several hierarchy levels deep without explicitly supplying it with their exact path.

XML:

<data>
    <items>
        <item name="item1"></item>
        <item name="item2"></item>
        <item name="item3"></item>
        <item name="item4"></item>
    </items>
    <secondSetOfItems>
        <item name="item5"></item>
        <item name="item6"></item>
        <item name="item7"></item>
        <item name="item8"></item>
    </secondSetOfItems>
</data>

Python code:

xmldoc = minidom.parse('sampleXML.xml')
items = xmldoc.getElementsByTagName('item') 

for item in items:
    print item.attributes['name'].value

Prints:

item1
item2
item3
item4
item5
item6
item7
item8

What bothers me is that it implicitly finds tags named item under both data->items as well as data->secondSetOfItems.

How do I make it follow an explicit path and only extract items under one of the two categories? E.g. under data->secondSetOfItems:

item5
item6
item7
item8

score 7 · Accepted Answer · edited Jan 14 '16 at 15:36

7

If you want to get items from a specific category, you can do so by grabbing the parent element first.

For example:

Code:

xmldoc = minidom.parse('sampleXML.xml')
#Grab the first occurence of the "secondSetOfItems" element
second_items = xmldoc.getElementsByTagName("secondSetOfItems")[0]
item_list = second_items.getElementsByTagName("item")

for item in item_list:
    print item.attributes['name'].value

Output:

item5
item6
item7
item8

edited Jan 14 '16 at 15:36

eh1160

674
1
6
14

answered Jan 14 '14 at 21:28

Dave Tucker

181
1
7

great thanks. one more question: say i had a value inside the item tag `XYZ`. how would i read the XYZ ? i tried `item.nodeValue` to no avail – amphibient Jan 14 '14 at 21:33
2

nvrmnd, it should be `item.childNodes[0].nodeValue` – amphibient Jan 14 '14 at 21:38

score 1 · Answer 2 · edited May 23 '17 at 11:58

this is the declared behavior of getElementsByTagName

Search for all descendants (direct children, children’s children, etc.) with a particular element type name.

some wrote a "filter" on it, see this answer

seem to me that minidom is too simple, consider using lxml xpath:

tree.xpath('//secondSetOfItems/item/@name')

or BeautifulSoup findAll:

data.secondSetOfItems.item.findAll('name')

Specific pathing to find XML elements using minidom in Python

2 Answers2

Linked