Root element confusion using ElementTree

Question

I'm using ElementTree in Python 3.5.1. I want to parse a xml file like:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <name>A name</name>
    <groupId>a.group</groupId>
    <artifactId>anArtifact</artifactId>
    <version>1.0</version>
    <packaging>pom</packaging>
    <properties>
        <dependency-version>10.0</dependency-version>
        <another-dependency-version>11.0</another-dependency-version>
    </properties>
</project>

And get the value of the tag dependency-version. I started trying to get the properties using this code:

mydoc = ElementTree.parse(sources + "pom.xml")
root = mydoc.getroot()
for element in root.findall('properties'):
    print(element)

The issue is that I got nothing but the root tag, project, and its attributes.

>>> root.tag
'{http://maven.apache.org/POM/4.0.0}project'
>>> root.text
'\n    '
>>> root.attrib
{'{http://www.w3.org/2001/XMLSchema-instance}schemaLocation': 'http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd'}

I tried also with mydoc directly:

>>> root.findall('project')
[]
>>> mydoc.findall('./properties')
[]
>>> mydoc.findall('./project/properties') 
[]

I understood that getroot() will give me the project tag and from there I can start working, but seems I got something wrong.

EDIT

I followed the proposed solution and I got:

>>> ns
{'sm': 'http://maven.apache.org/POM/4.0.0'}
>>> mydoc.findall('.//sm:properties', ns)
[<Element '{http://maven.apache.org/POM/4.0.0}properties' at 0x0325AA80>]
>>> root.findall('.//sm:properties', ns)
[<Element '{http://maven.apache.org/POM/4.0.0}properties' at 0x0325AA80>]
>>> mydoc.findall('.//sm:properties/dependency-version', ns)
[]

Seems is finding something now, but not the two elements of the tag properties

It's because you're not taking the default namespace (`http://maven.apache.org/POM/4.0.0`) into account. See my answer here for an example: https://stackoverflow.com/a/52864678/317052 — Daniel Haley, Jan 14 '19 at 16:49
Possible duplicate of [Parsing text from XML node in Python](https://stackoverflow.com/questions/52847343/parsing-text-from-xml-node-in-python) — mzjn, Jan 14 '19 at 17:39
Please don't add solutions in the question. Post a proper answer instead. You can accept your own answer. — mzjn, Jan 16 '19 at 08:37
`mydoc.findall('.//sm:properties/sm:dependency-version', ns)` works. The prefix must be used on all elements. — mzjn, Jan 16 '19 at 12:07

score 0 · Accepted Answer · answered Jan 16 '19 at 11:40

At the end I got an idea from: Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall" What is, basically, get rid of the namespaces.

import re
import xml.etree.ElementTree as ElementTree

filestring = open("C:/temp/test.xml", "r").read()
xmlwithoutns = re.sub('<project[^>]+', '<project>', filestring, count=1)
tree = ElementTree.fromstring(xmlwithoutns)
value = tree.findall("properties/dependency-version")[0].text

Root element confusion using ElementTree

1 Answers1