XPath for all nodes where descendant contains text of parent?

Question

I'm trying to retrieve all <t> nodes in the following expression:

<x>
    <t>10
        <s>,14,14,16,</s>
    </t>
    <t>12
        <s>,14,14,16,</s>
    </t>
    <t>14
        <s>,14,14,16,</s>
    </t>
    <t>14</t>
</x>

The condition is such that the child node should contain the text from <t>. Therefor I tried the following:

//t[.//*[contains(., ',')]]

This nicely retrieved all <t> nodes where the descendant contained a comma. However I wanted to refer back to it's parent. Effectively looking like: //t[.//*[contains(., concat(',', /.., ','))]]. However this returns no matches.

Obviously I'm doing something wrong here. My expected result is only 14. Is it possible to make a reference to descendants and another back to it's parent? If so, what is the right syntax?

Rollback reason: Never change the question so substantially that it would invalidate existing answers. — kjhughes, Jul 21 '20 at 14:57

E.Wiest · Answer 1 · 2020-07-21T22:54:09.673

2

You could use something like :

//s[contains(.,number(string(parent::t/text())))]/..

Output :

<t>
14  
<s>,14,14,16,</s>
</t>

Another option :

//s[substring(.,2,2)=number(string(parent::t/text()))]/..

EDIT : To fix false positives :

//s[contains(.,concat(",",normalize-space(parent::t/text()),","))]

edited Jul 21 '20 at 22:54

answered Jul 21 '20 at 14:35

E.Wiest

5,425
2
7
12

Thanks for your response. Interesting as XPATH is not my forte but keen to learn. I'm using it in Excel's `FILTERXML` function. However, your approach gives me false positives. It would return `1` in `1~~,14,14,16,~~`. Hence my concatenation with commas. – JvdV Jul 21 '20 at 14:43
However, what did work was: `//t[.//*[contains(.,concat(',',../text(),','))]]` and to filter out duplicates I added: `[not(preceding::*=.)]`. Therefor thanks for your contribution again =) – JvdV Jul 21 '20 at 15:27
1

Nice. Yes, it makes sense to use `concat` and commas if you have `1` (which was absent from your sample data). For safety, it's probably better to add a `normalize-space()` (during the concat step) in your XPath expression. Last thing : you made a great topic about the Excel's `FILTERXML` function. Full of details with a lot of XPath examples. Keep up the good work.:) – E.Wiest Jul 21 '20 at 23:02
Thanks for the support. I'll keep the normalisation in mind – JvdV Jul 22 '20 at 06:13

kjhughes · Accepted Answer · 2020-07-21T15:00:01.000

2

This XPath,

//t[contains(s,normalize-space(text()[1]))]

will select all t elements whose first, whitespace normalized text node is found as a substring of its s child element.

Note, that this might yield false positives for cases such as

<t>1
    <s>,14,14,16,</s>
</t>

One can easily adapt the XPath idiom for space-separated classes to avoid this problem:

//t[contains(concat(' ', translate(s,',',' '), ' ') ,
             concat(' ', normalize-space(text()[1]), ' '))]

edited Jul 21 '20 at 15:00

answered Jul 21 '20 at 14:35

kjhughes

106,133
27
181
240

Thanks for your response. As per the other answer I'm receiving false positives unfortunately for exactly a case as per your *note*. Hence why I was looking at concatenation. I'll update my question since Excel gives me the freedom to turn the question around if that would make it *"easier"*. – JvdV Jul 21 '20 at 14:45
1

I've updated the answer to show exactly how to adapt the idiom for space-separated classes to avoid the false positive selections here. – kjhughes Jul 21 '20 at 15:01
That's great. I've implemented it in Excel's `FILTERXML` and with some adjusting to return non-empty non-dups I ended up with `//t[contains(concat(' ', translate(s,',',' '), ' ') ,concat(' ', normalize-space(text()[1]), ' '))][not(preceding::*=.)][node()]` – JvdV Jul 21 '20 at 15:02
Yes, if the alternative XML design is still a consideration for you even though you have an XPath that covers your original XML design, feel free to post a new question. Frankly, however, I do not see the alternative as offering much of an advantage. One general XML design tip might instead be to use elements everywhere rather than commas as delimiters when you care about the individual members of a list. – kjhughes Jul 21 '20 at 15:13
Hm, wish I could accept two answers here. Though giving me false positives, the answer by @E.Wiest pointed me into the direction of `//t[.//*[contains(.,concat(',',parent::t/text(),','))]]` which worked great for me. – JvdV Jul 21 '20 at 15:16
Is there some way in which the XPath given in this answer is not working well for you? If it is meeting your needs perfectly, it also has the advantage of being simpler than what you've listed in your comment. – kjhughes Jul 21 '20 at 15:17
No all is good. Works fine. Was just trying out different ways (eager to learn about XPATH) and after some fidling around came to the above syntax that worked too. – JvdV Jul 21 '20 at 15:19

XPath for all nodes where descendant contains text of parent?

2 Answers2