2

the word/_rels/document.xml.rels in a .docx file has a empty preifx namespace element: <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> which causes a problem that I can't use findall mehtod to get the children nodes.

simplified examples:

>>> from lxml import etree
>>> etree.fromstring(b'<x><y id="1"/><y id="2"/></x>').findall('y')
[<Element y at 0x382d788>, <Element y at 0x382db48>]
>>> etree.fromstring(b'<x xmlns="wow"><y id="1"/><y id="2"/></x>').findall('y')
[]
# How to find these children nodes like previous one?
har07
  • 88,338
  • 12
  • 84
  • 137
tcpiper
  • 2,456
  • 2
  • 28
  • 41

1 Answers1

2

Should be the same as using the built-in xml.etree.ElementTree, plus another option if you use lxml's xpath() method :

>>> from lxml import etree
>>> root = etree.fromstring(b'<x xmlns="wow"><y id="1"/><y id="2"/></x>')

>>> root.findall('{wow}y')
[<Element {wow}y at 0x2b489c8>, <Element {wow}y at 0x2b48588>]

>>> ns = {'d': 'wow'}
>>> root.findall('d:y', ns)
[<Element {wow}y at 0x2b489c8>, <Element {wow}y at 0x2b48588>]
>>> root.xpath('d:y', namespaces=ns)
[<Element {wow}y at 0x2b489c8>, <Element {wow}y at 0x2b48588>]

Notice that descendant elements without prefix inherits ancestor's default namespace implicitly, that's why you need to consider the namespace when selecting <y> despite the namespace was declared at the parent element <x>.

Community
  • 1
  • 1
har07
  • 88,338
  • 12
  • 84
  • 137