0

I have an HTML page with this structure:

<big><b>Staff in:</b></big>
<br>
<a href='...'>Movie 1</a>
<br>
<a href='...'>Movie 2</a>
<br>
<a href='...'>Movie 3</a>
<br>
<br>
<big><b>Cast in:</b></big>
<br>
<a href='...'>Movie 4</a>

How do I select Movies 1, 2, and 3 using Xpath? I wrote this query

'//big/b[text()="Staff in:"]/following::a'

but it returns Movies 1, 2, 3, and 4. I guess I need to find a way to get items after <big><b>Staff in: but before the next <big>.

Thanks,

jcnesci
  • 91
  • 2
  • 10
  • Googling `xpath select items between items` seems to yield good results? Always check that out first. – Pekka Jun 11 '15 at 07:52
  • possible duplicate of [XPath select all elements between two specific elements](http://stackoverflow.com/questions/10859703/xpath-select-all-elements-between-two-specific-elements) – Pekka Jun 11 '15 at 07:53
  • You're right @Pekka웃 the search has some good results already and I hadn't seen that one, but the answer doesn't work for me, probably because of my lack of understanding of Xpath. Either way, I found the perfect answer for my case here. Thanks though, – jcnesci Jun 16 '15 at 04:05

3 Answers3

2

Assuming that <big><b>Staff in:</b></big> is a unique element that we can use as 'anchor', you can try this way :

//big[b='Staff in:']/following-sibling::a[preceding-sibling::big[1][b='Staff in:']]

Basically, the xpath finds all <a> that is following sibling of the 'anchor' <big> element mentioned above, and restrict the result to those having nearest preceding sibling <big> equals the anchor element.

output in xpath tester given markup in question as input (with minimal adjustment to make it well-formed XML) :

Element='<a href="...">Movie 1</a>'
Element='<a href="...">Movie 2</a>'
Element='<a href="...">Movie 3</a>'
har07
  • 88,338
  • 12
  • 84
  • 137
  • All the answers worked like a charm, but this is my favourite, as it only relies on one of the title strings ('Staff in'), which makes it more flexible for my usage since I can use it regardless of what is after (whether it's 'Cast in' or whatever else). Learned a new trick with following/preceding used on the same element too, awesome. – jcnesci Jun 16 '15 at 04:08
0

//a[preceding::b[text()="Staff in:"] and following::b[text()="Cast in:"]]

Returns all a after the element b with text Staff in: but before the element b with the text Cast in:.

You may need to add some more conditions to make it more specific depending on whether or not these b elements are unique on the page.

marven
  • 1,836
  • 1
  • 17
  • 14
0

Just to add up and following the stackoverflow link here XPath axis, get all following nodes until here is the complete solution that i have worked up with xslt editor. Firstly /*/ is used instead of // as this is faster. Second the logic says all anchor nodes which are siblings of big are returned if they satisfy the inner condition that they have preceding sibling of big node equal to what they are following. Also presumed you have distinct big node.

The x-path looks like

/*/big[b="Cast in:"]/following-sibling::a [1 = count(preceding-sibling::big[1]| ../big[b="Cast in:"])]

The xslt solution looks like

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <html>
            <body>
            <h2>My Movie Collection</h2>
            <table border="1">
                <tr bgcolor="#9acd32">
                    <th>Title</th>

                </tr>
                <xsl:variable name="placeholder" select="/*/big" />
                <xsl:for-each select="$placeholder">
                    <xsl:variable name="i" select="position()" />
                    <b>
                        <xsl:value-of select="$i" />
                        <xsl:value-of select="$placeholder[$i]" />
                    </b>
                    <xsl:for-each
                        select="following-sibling::a [1 = count(preceding- 
sibling::big[1]| ../big[b=$placeholder[$i]])]">
                        <tr>
                            <td>
                                <xsl:value-of select="." />
                            </td>

                        </tr>
                    </xsl:for-each>
                </xsl:for-each>
            </table>
        </body>
    </html>
</xsl:template>
</xsl:stylesheet>
Community
  • 1
  • 1
mkanugan
  • 62
  • 9