0

I am using lxml in python 2.7 to parse an xml file.

the file looks like this:

...
<LM>sua</LM>
<LM>citt&agrave;</LM>
<LM>e</LM>
<LM>l'</LM>
<LM>alto</LM>
<LM>seggio</LM>:
     </l><l>
<LM>oh</LM>
<LM>felice</LM>
<LM>colui</LM>
<LM>cu'</LM>
<LM>ivi</LM>
<LM>elegge</LM>!.
     </l><l>
<LM> E</LM>
<LM>io</LM>
<LM>a</LM>
<LM>lui</LM>:
...

I am iterating through the tree looking for LM nodes.

for node in [z for z in  tree.iterfind(".//LM")]:
    print tree.getpath(node.getparent())

and I get the following output for each node:

'/TEI.2/text/body/div1/l[480]'

So, in this case this means the the current node LM is under the 480th node L. Is there a way to get this 480 that is note the following ?

In [77]: int(tree.getpath(node.getparent()).split('/')[5][2:].replace(']',''))
Out[77]: 480

I mean an elegant way via xpath.

Angelo
  • 767
  • 1
  • 6
  • 21
  • 1
    `.getpath` is only returning a string, and I don't think lxml provides anything more granular. If you only cared about the last node you could do this `int(re.search("\[(.*?)]", tree.getelementpath(node.getparent())).groups()[0])` (but this isn't necessarily "better"). – Raceyman Sep 08 '15 at 18:40

1 Answers1

1

So, in this case this means the the current node LM is under the 480th node L. Is there a way to get this 480 that is note the following ?

int(tree.getpath(node.getparent()).split('/')[5][2:].replace(']',''))

If I understand you correctly, you merely want the position relative to its parent? You can have the XPath return this last position by doing:

node.find("position()")

In normal XPath 1.0, this means "get the position of the current node relative to its parent". However, it looks like the XPath support of this Python module is severely limited. The expressions supported can only be used to return a node and not a value.

If you can use XSLT in Python, you can get all the positions using the XPath 1.0 syntax //LM/position(). And to get the path as well, you have to do a bit more:

<xsl:template match="/">
    <xsl:apply-templates select="//LM" />
</xsl:template>

<xsl:template match="LM">
    <xsl:text>Position: </xsl:text>
    <xsl:value-of select="position()" />
    <xsl:text>, XPath: </xsl:text>
    <xsl:apply-templates select="ancestor::*" mode="path" />
    <xsl:text>&#xA;</xsl:text>
</xsl:template>

<xsl:template match="*" mode="path">
    <xsl:text>/</xsl:text>
    <xsl:value-of select="name()" />
</xsl:template>

This will output a bunch of lines like:

Position: 4, XPath: /a/b/c
Position: 9, XPath: /a/b/d
Community
  • 1
  • 1
Abel
  • 56,041
  • 24
  • 146
  • 247
  • I understand. In this case I think I will stick to my sub-optimal solution. – Angelo Sep 09 '15 at 15:26
  • @Angelo, ok, no problem. Then you will have to loop all nodes by hand and count, as there's no support for the feature yet in Python using the libraries you currently use... – Abel Sep 09 '15 at 15:39