Not recursive (single node level) getElementsByTagName in Python xml.dom

Question

Is there any way to use getElementsByTagName only at a single node level and not recursively?

E.g. consider parsing a pom.xml file:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">

    <parent>
        <groupId>com.parent</groupId>
        <artifactId>parent</artifactId>
        <version>1.0-SNAPSHOT</version>
        <relativePath>../pom.xml</relativePath>
    </parent>

    <modelVersion>2.0.0</modelVersion>
    <groupId>com.parent.somemodule</groupId>
    <artifactId>some_module</artifactId>
    <packaging>jar</packaging>
    <version>1.0-SNAPSHOT</version>
    <name>Some Module</name>
    ...

If I want to get groupId at the top level (specifically project->groupId, not project->parent->groupId), I use:

xmldoc = minidom.parse('pom.xml')
groupId = xmldoc.getElementsByTagName("groupId")[0].childNodes[0].nodeValue

But unfortunately, that finds the first physical occurrence of groupId in the file regardless of the hierarchy level, which is project->parent->groupId. I actually want to do a unrecursive find ONLY at a specific node level, not within its children. Is there a way to do it in xml.dom?

UPDATE: I switched to BeautifulSoup but still having the same problem with implicit recursive traversing: Finding a nonrecursive DOM subnode in Python using BeautifulSoup

score 3 · Accepted Answer · answered Jan 15 '14 at 17:40

3

You can iterate over getElementsByTagName() results and take the first element that is in on the root level:

group_id_element =  next(element for element in xmldoc.getElementsByTagName("groupId")
                         if element.parentNode == xmldoc.documentElement)

print group_id_element.childNodes[0].nodeValue

Note that it would be easier, shorter and faster to do the same with ElementTree, which is also a part of standard library.

Hope that helps.

answered Jan 15 '14 at 17:40

alecxe

462,703
120
1,088
1,195

so are you saying `ElementTree` is more granular and sophisticated? – amphibient Jan 15 '14 at 17:43
1

@amphibient well, this is my opinion, of course. When I need to parse an xml file, I prefer to use `ElementTree`, `lxml` or `BeautifulSoup`. – alecxe Jan 15 '14 at 17:53
can you also please see http://stackoverflow.com/questions/21146417/simple-dom-traversing-in-python-using-xml-etree-elementtree ? Thanks – amphibient Jan 15 '14 at 19:26
see also http://stackoverflow.com/questions/21147686/finding-a-nonrecursive-dom-subnode-in-python-using-beautifulsoup – amphibient Jan 15 '14 at 20:38

Not recursive (single node level) getElementsByTagName in Python xml.dom

1 Answers1