Finding the position index of a comment()

Question

Faced with this:

<div>
some text
<!-- this is the hook comment-->
target part 1
target part 2
<!-- this is another comment-->
some other text
</div>

I'm trying to get to the desired output of:

target part 1 target part 2

The number of comments and text elements is unknown, but the target text always comes after the comment containing hook. So the idea is to find the position() of the relevant comment(), and get the next element.

There are some previous questions about finding the position of an element containing a certain text or by attribute, but comment() is an odd duck and I can't modify the answers there to this situation. For example, trying a variation on the answers:

//comment()[contains(string(),'hook')]/preceding::*

or using preceding-sibling::*, returns nothing.

So I decided to try something else. A count(//node()) of the xml returns 6. And //node()[2] returns the relevant comment(). But when I try to get the position of that comment by using index-of() (which should return 2)

index-of(//node(),//comment()[contains(string(),'hook')])

it returns 3!

Of course, I can disregard that and use the 3 index position as the position for the target text (instead of incrementing 2 by 1), but I was wondering, first, why is the outcome what it is and, second, does it have any unintended consequences.

score 1 · Accepted Answer · answered Sep 20 '19 at 18:26

There is no need to firstly find the position() of the elements if you want to get the nodes between two comments (FYI position() depends on the whole nodeset you selected).

You can get the elements directly - here they are text() nodes. So a sample file like

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <div>
    some text
    <!-- this is the hook comment-->
    target part 1
    target part 2
    <!-- this is another comment-->
    some other text
        <!-- this is another comment-->
    no one needs this
        <!-- this is another comment-->
    this is also useless
        <!-- this is another hook comment-->
    second target text
        <!-- this is another comment-->
    again some useless crap
        <!-- this is another comment-->
    and the last piece that noone needs
    </div> 
</root>

can be queried with the following expression

//comment()[contains(string(),'hook')]/following-sibling::text()[preceding-sibling::comment()[1][contains(string(),'hook')]]

to result in

target part 1
target part 2

second target text

If you only want the first block, restrict the expression to the first item:

(//comment()[contains(string(),'hook')]/following-sibling::text()[preceding-sibling::comment()[1][contains(string(),'hook')]])[1]

Its result is

target part 1
target part 2

as desired.

If you can use XPath-2.0, you can append a /position() to the expressions above to get the position of the comment()s. But, as mentioned above, they are relative to comment nodes. So the result would be 1 2.

Those expressions make your head spin but sure do work! Thanks. — Jack Fleeting, Sep 20 '19 at 20:23

Finding the position index of a comment()

1 Answers1