Xpath: Select node but not specific child elements

Question

I have a structure similar to the following:

<page id='1'>
  <title>Page 1</title>    
  <page id='2'>
    <title>Sub Page 1</title>
  </page>
  <page id='3'>
    <title>Sub Page 2</title>
  </page>    
</page>
<page id='4'>
  <title>Page 2</title>
</page>

I need to select a page by Id but if that page has descendant pages I don't want to return those elements, but I do want the other elements of that page. If I select Page 1 I want to return title but not the child pages...

//page[@id=1]

The above gets me page 1, but how do I exclude the sub pages? Also, There could be any arbitrary number of elements in a page.

//page[@id=1]/*[not(self::page)]

I have found that this gets me the data I want. However, that data comes back as an array of objects with one object per element and apparently excludes the element names???. I am using PHP SimpleXML for what it is worth.

Good question, +1. See my answer for a short and simple solution. :) — Dimitre Novatchev, Aug 19 '11 at 13:26
"However, that data comes back as an array of objects with one object per element." How is that different from what you want/need? — LarsH, Aug 19 '11 at 19:30
The data comes back in a different format depending on the xpath query, I get an array of SimpleXMLElement with a single string in each and is missing the the element names. The first case returns a single SimpleXMLElement object with all the expected key value pairs. I don't understand why, perhaps I will open another question. — Ben, Aug 20 '11 at 07:07

Dimitre Novatchev · Accepted Answer · 2011-08-20T14:18:27.567

Use:

//page[@id=$yourId]/node()[not(self::page)]

This selects all nodes that are not page and that are children of any page in the document, the string value of whose id attribute is equal to the string contained in $yourId (most probably you would substitute $yourId above with a specific, desired string, such as '1').

Here is a simple XSLT-based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:param name="pId" select="3"/>

 <xsl:template match="/">
     <xsl:copy-of select="//page[@id=$pId]/node()[not(self::page)]"/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML document (wrapped in a single top node to make it well-formed):

<pages>
    <page id='1'>
        <title>Page 1</title>
        <page id='2'>
            <title>Sub Page 1</title>
        </page>
        <page id='3'>
            <title>Sub Page 2</title>
        </page>
    </page>
    <page id='4'>
        <title>Page 2</title>
    </page>
</pages>

the wanted, correct result is produced:

<title>Sub Page 2</title>

Do note: One assumption made is that an id value uniquely identifies a page. If this is not so, the proposed XPath expression will select all page elements whose id attribute has a string valu of $yourId.

If this is the case and only one page element must be selected, the OP must specify which one of the many page elements with this id should be selected.

For example, it may be the first:

(//page[@id=$yourId]/node()[not(self::page)])[1]

or the last:

(//page[@id=$yourId]/node()[not(self::page)])[last()]

or ...

While this looks exactly right, it doesn't actually work.. I am not sure if there is something wrong with xpath in PHP's simple xml, but this returns multiple copies of the requested page??? — Ben, Aug 20 '11 at 12:49
@Ben: This may happen only if more than one `page` can have the same value of its `id` attribute. I have updated my answer to cover this case. I also provide a simple verification showing that the initial XPath expression selects exactly one `page` element if an `id` value uniquely identifies a `page`. — Dimitre Novatchev, Aug 20 '11 at 14:20

Scott Ferguson · Answer 2 · 2011-08-19T01:21:45.563

1

If you're only interested in the title element, this would work:

//page[@id=1]/title

If however you need other sub elements of page, I'm not sure XPath is the right tool for you. Sounds more like something that an XSLT would be suited for, since what you are really doing is transforming your data.

edited Aug 19 '11 at 01:21

answered Aug 19 '11 at 01:14

Scott Ferguson

7,690
7
41
64

Updated answer with further information. Feel free to upvote if it's helpful in anyway. :) – Scott Ferguson Aug 19 '11 at 01:22
Thanks, I am starting to think Xpath maybe cannot do this. I can always write something to process out the data I want but was hoping to do it at the data level. – Ben Aug 19 '11 at 01:32

score 1 · Answer 3 · edited Sep 19 '14 at 14:02

1

If the page always has a title:

//page[@id='1']/*[not(boolean(./title))]

edited Sep 19 '14 at 14:02

Dan Atkinson

11,391
14
81
114

answered Aug 19 '11 at 01:59

Msyk

662
5
14

Xpath: Select node but not specific child elements

3 Answers3

Linked