Is there any way to use getElementsByTagName
only at a single node level and not recursively?
E.g. consider parsing a pom.xml
file:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<parent>
<groupId>com.parent</groupId>
<artifactId>parent</artifactId>
<version>1.0-SNAPSHOT</version>
<relativePath>../pom.xml</relativePath>
</parent>
<modelVersion>2.0.0</modelVersion>
<groupId>com.parent.somemodule</groupId>
<artifactId>some_module</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>Some Module</name>
...
If I want to get groupId
at the top level (specifically project->groupId
, not project->parent->groupId
), I use:
xmldoc = minidom.parse('pom.xml')
groupId = xmldoc.getElementsByTagName("groupId")[0].childNodes[0].nodeValue
But unfortunately, that finds the first physical occurrence of groupId
in the file regardless of the hierarchy level, which is project->parent->groupId
. I actually want to do a unrecursive find ONLY at a specific node level, not within its children. Is there a way to do it in xml.dom
?
UPDATE: I switched to BeautifulSoup
but still having the same problem with implicit recursive traversing: Finding a nonrecursive DOM subnode in Python using BeautifulSoup