3

I am using lxml with xpath to parse an epub3, xhtml content file.

I want to select all the li nodes with the attribute epub:type="footnote" as for example

<li epub:type="footnote" id="fn14"> ... </li>

I cannot find the right xpath expression for it.

The expression

//*[self::li][@id]

does select all the li nodes with attribute id, but when I try

//*[self::li][@epub:type]

I get the error

lxml.etree.XPathEvalError: Undefined namespace prefix

The XML is

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
    <head>
        <meta charset="utf-8" />
        <link rel="stylesheet" href="stylesheet.css" />
    </head>
    <body> 
        <section class="footnotes">
            <hr />
            <ol>
                <li id="fn1" epub:type="footnote">
                    <p>See foo</p>
                </li>
            </ol>
        </section>
    </body>
</html>

Any suggestions on how to write the correct expression?

kjhughes
  • 106,133
  • 27
  • 181
  • 240
MrCastro
  • 435
  • 1
  • 4
  • 14

1 Answers1

5

Have you declared the namespace prefix epub to lxml?

>>> tree.getroot().xpath(
...     "//li[@epub:type = 'footnote']", 
...     namespaces={'epub':'http://www.idpf.org/2007/ops'}
...     )

Update per question update

The XHTML namespace is also tripping you up. Try:

>>> tree.getroot().xpath(
...     "//xhtml:li[@epub:type = 'footnote']", 
...     namespaces={'epub':'http://www.idpf.org/2007/ops', 'xhtml': 'http://www.w3.org/1999/xhtml'}
...     )
kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Indeed, but it didn't work. `li = tree.getroot().xpath("//li[@epub:type = 'footnote']", namespaces={'epub':'http://www.idpf.org/2007/ops'})` `print len(li)` `>>> 0` if I try to do `print(tree.getroot())` the result is `` when I was expecting it to be: `` Do you think that mean that the tree namespace is not http://www.idpf.org/2007/ops ? – MrCastro May 07 '14 at 14:54