0

I have a pretty complicated XML document, that I want to parse. Here is a simplified version of that XML:

<file
    xmlns="http://www.namespace.co.il"
    Media="MetTTV"
    Date="2015-03-29"
    FileType="Consolidated"
    SchemaVersion="1.2">

    <H Id="1012532" W="2198.05">
        ///more tags
    </H>
    <H Id="623478" W="3215.05">
        ///more tags
    </H>
   etc.
</file>

I want to get access to the < H > tags in order to count them.

here is my code:

import import lxml.etree
tree=lxml.etree.parse(xml_file)
count=1 
for HH in tree.xpath('//H'):
   print count
   count=count+1

this code works fine if I delete the

xmlns="http://www.namespace.co.il"

line.

But if I don't - it doesn't print anything to the console.

I tried changing the loop in many combinations, like

for HH in tree.xpath('//{http://www.namespace.co.il}H'):

or with

ns={'nmsp':'http://www.namespace.co.il'}
for HH in tree.xpath('//nmsp:H', ns)

but nothing seems to be working.

Binyamin Even
  • 3,318
  • 1
  • 18
  • 45
  • Possible duplicate of [lxml etree xmlparser namespace problem](http://stackoverflow.com/questions/4255277/lxml-etree-xmlparser-namespace-problem) – Keith Hall Jan 06 '16 at 10:20

1 Answers1

0

lxml's xpath method expects a named parameter (keyword argument) called namespaces.

The findall method is similar, but a little different (it does not require a named namespaces parameter and it works with namespace URIs within curly braces).

All these variants work:

for HH in tree.xpath('//nmsp:H', namespaces=ns):

for HH in tree.findall('//{http://www.namespace.co.il}H'):

for HH in tree.findall('//nmsp:H', namespaces=ns):

for HH in tree.findall('//nmsp:H', ns):

See also http://lxml.de/xpathxslt.html#xpath.

mzjn
  • 48,958
  • 13
  • 128
  • 248
  • Was this answer helpful? If it solved your problem, please mark it as accepted. If it didn't solve the problem, please explain why. – mzjn Jan 18 '16 at 05:43