0

My XML file is available here. Though I am able to get the root node and its child nodes from this file. But, I am not able to get the one that I need. I want to get the content of <ce:section-title>Methods</ce:section-title> I have tried both xml and lxml package.

When I use the following,

 tree = lxml.etree.parse(fname) #fname is xml filename
 root= tree.getroot()

print(root[5].findall("ce:section-title",root.nsmap)

It just gives me null [] bracket. It gives the same null bracket when I use the following command:

for item in tree.iter('{http://www.elsevier.com/xml/ja/dtd}ce:section-title'):
    print(item)

I did try to solve with the solution provided here, but I am getting the following error on this code:

ns = {"ce":"http://www.elsevier.com/xml/common/dtd"}
print(root.findall("ce:title", ns).text)

AttributeError: 'NoneType' object has no attribute 'text'

Any direction will be helpfull

user3050590
  • 1,656
  • 4
  • 21
  • 40

1 Answers1

1

It should work with findall(.//ce:section-title, root.nsmap).

With .// prepended, you are searching for section-title descendants at all levels below the context node. With findall(ce:section-title, root.nsmap), only direct child elements can be located.

Example:

from lxml import etree

tree = etree.parse("data.xml")  # Your XML
root = tree.getroot()

for e in root.findall(".//ce:section-title", root.nsmap):
    print(e.text)

Output:

Abstract
Keywords
Introduction
Materials and methods
Results
The appearing species by taxon
List of regional appearing species
Discussion
Acknowledgments
References
mzjn
  • 48,958
  • 13
  • 128
  • 248