1

I'm using ElementTree in Python 3.5.1. I want to parse a xml file like:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <name>A name</name>
    <groupId>a.group</groupId>
    <artifactId>anArtifact</artifactId>
    <version>1.0</version>
    <packaging>pom</packaging>
    <properties>
        <dependency-version>10.0</dependency-version>
        <another-dependency-version>11.0</another-dependency-version>
    </properties>
</project>

And get the value of the tag dependency-version. I started trying to get the properties using this code:

mydoc = ElementTree.parse(sources + "pom.xml")
root = mydoc.getroot()
for element in root.findall('properties'):
    print(element)

The issue is that I got nothing but the root tag, project, and its attributes.

>>> root.tag
'{http://maven.apache.org/POM/4.0.0}project'
>>> root.text
'\n    '
>>> root.attrib
{'{http://www.w3.org/2001/XMLSchema-instance}schemaLocation': 'http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd'}

I tried also with mydoc directly:

>>> root.findall('project')
[]
>>> mydoc.findall('./properties')
[]
>>> mydoc.findall('./project/properties') 
[]

I understood that getroot() will give me the project tag and from there I can start working, but seems I got something wrong.

EDIT

I followed the proposed solution and I got:

>>> ns
{'sm': 'http://maven.apache.org/POM/4.0.0'}
>>> mydoc.findall('.//sm:properties', ns)
[<Element '{http://maven.apache.org/POM/4.0.0}properties' at 0x0325AA80>]
>>> root.findall('.//sm:properties', ns)
[<Element '{http://maven.apache.org/POM/4.0.0}properties' at 0x0325AA80>]
>>> mydoc.findall('.//sm:properties/dependency-version', ns)
[]

Seems is finding something now, but not the two elements of the tag properties

Franjavi
  • 647
  • 4
  • 14
  • 1
    It's because you're not taking the default namespace (`http://maven.apache.org/POM/4.0.0`) into account. See my answer here for an example: https://stackoverflow.com/a/52864678/317052 – Daniel Haley Jan 14 '19 at 16:49
  • 1
    Possible duplicate of [Parsing text from XML node in Python](https://stackoverflow.com/questions/52847343/parsing-text-from-xml-node-in-python) – mzjn Jan 14 '19 at 17:39
  • I got a different solution removing the namespaces. – Franjavi Jan 15 '19 at 10:03
  • 1
    Please don't add solutions in the question. Post a proper answer instead. You can accept your own answer. – mzjn Jan 16 '19 at 08:37
  • 1
    `mydoc.findall('.//sm:properties/sm:dependency-version', ns)` works. The prefix must be used on all elements. – mzjn Jan 16 '19 at 12:07
  • 1
    Yes, that worked also, thanks! I have not thought on that – Franjavi Jan 16 '19 at 15:51

1 Answers1

0

At the end I got an idea from: Python ElementTree module: How to ignore the namespace of XML files to locate matching element when using the method "find", "findall" What is, basically, get rid of the namespaces.

import re
import xml.etree.ElementTree as ElementTree

filestring = open("C:/temp/test.xml", "r").read()
xmlwithoutns = re.sub('<project[^>]+', '<project>', filestring, count=1)
tree = ElementTree.fromstring(xmlwithoutns)
value = tree.findall("properties/dependency-version")[0].text
Franjavi
  • 647
  • 4
  • 14