0

I have list of XPath expressions. Is there a convenient way in .NET to test that no XPath expressions in the list overlap in their addressed scope? Example:

string1 = /nodeA/nodeB/nodeC  
string2 = /nodeA/modeB/nodeD  

Would generate false as there is no overlap.

string1 = /nodeA/nodeB/nodeC  
string2 = /nodeA/nodeB/nodeC/nodeF  

In case of this pair of arguments, it would return true as the set of elements selected by the expression string1 overlaps the result of evaluating string2.

EDIT:
Given the answers below, I have learned that there is an "Intersect" solution available but as I need to resolve overlap "conflicts" without applying the expression on the XML document I cannot use it. I understand now that the arbitrary case is close to impossible. So my solution will be to restrict the type of expressions that are allowed and do string compares.

Jörgen
  • 316
  • 4
  • 10
  • You need to be more specific about your question. Do you want to test if the *results* overlap for some specific context? Or whether they could overlap for any context whatsoever? Or just whether the paths themselves "overlap" - you could use string operations to detect that. The other things are in general impossible to solve for arbitrary paths without executing them on data. – Mike Sokolov Jan 24 '13 at 20:55

4 Answers4

1

I wrote code with a similar purpose to this once, for the stylesheet used to produce the errata for the XSLT 2.0 and related specs. You might find it gives you some ideas. This contains logic to check that there was no text in the spec "affected" by more than one erratum. The code is as follows (it uses saxon:evaluate because the XML document defining an erratum contains XPath expressions indicating which sections in the base document are "affected").

A key aim here is not to determine whether two XPath expressions select overlapping nodes, but whether N such expressions have any overlaps, where the expressions are not known in advance - and we want to do this without looking for overlaps between all pairs of expressions.

There's an interesting expression here: count($x) != count($x/.). Here ($x/.) is used to force elimination of duplicate nodes from a node sequence, so the test is asking whether elimination of duplicates removes any nodes, i.e. whether $x contains any duplicates.

<!-- The following template checks that there is no element in the source document
     that is replaced or deleted by more than one erratum -->

<xsl:template name="check-for-conflicts">
  <xsl:variable name="directly-affected-elements" 
      select="er:eval-all(/er:errata/er:erratum[not(@superseded)]//er:old-text[not(starts-with(@action, 'insert-'))])"/>
  <xsl:variable name="all-affected-elements"
      select="for $e in $directly-affected-elements return $e/descendant-or-self::*"/>
  <xsl:if test="count($all-affected-elements) != count($all-affected-elements/.)">
     <!-- we now know there are duplicates, we just need to identify them... -->
     <xsl:for-each-group select="$all-affected-elements" group-by="generate-id()">
       <xsl:if test="count(current-group()) gt 1">
         <xsl:variable name="id" select="(ancestor::*/@id)[last()]"/>
         <xsl:variable name="section" select="$spec/key('id',$id)"/>
         <xsl:variable name="section-number">
           <xsl:number select="$section" level="multiple" count="div1|div2|div3|div4"/>
         </xsl:variable>
         <xsl:variable name="loc" select="er:location($section, .)"/>
         <p style="color:red">
           <xsl:text>WARNING: In </xsl:text>
           <xsl:value-of select="$section-number, $section/head"/>
           <xsl:text> (</xsl:text>
           <xsl:value-of select="$loc"/>
           <xsl:text>) Element is affected by more than one change</xsl:text>
         </p>
       </xsl:if>
     </xsl:for-each-group>
   </xsl:if>           
</xsl:template>

<!-- Support function for the check-for-conflicts template.
     This function builds a list (retaining duplicates) of all elements
     in the source document directly affected by a replacement or deletion.
     "Directly affected" means that the element is explicitly selected for replacement
     or deletion; the descendants of this element are indirectly affected. -->       

<xsl:function name="er:eval-all" as="element()*">
  <xsl:param name="in" as="element(er:old-text)*"/>
  <xsl:for-each select="$in">
    <xsl:variable name="id" select="@ref"/>
    <xsl:variable name="section" select="$spec/key('id',$id)"/>
    <xsl:variable name="exp" select="@select"/>
    <xsl:variable name="nodes" select="$section/saxon:evaluate($exp)"/>
    <xsl:sequence select="$nodes"/>
  </xsl:for-each>
</xsl:function>
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
0

You could use the intersect operator introduced in XPath 2.0 to check if the intersection of the sets selected by these expressions is empty.

/nodeA/nodeB/nodeC intersect /nodeA/nodeB/nodeC/nodeF

AFAIK, Saxon supports XPath 2.0 and it's available for the .NET platform. There are some alternatives mentioned in answers to this SO question. You'll have to take a look around on your own. I'm not really a .NET person so I'm not familiar with what's new in the platform.

You may also look for a solution based on LINQ to XML. An approach like this is discussed in this msdn thread. Here's a summary on the OP's (Santosh Benjamin's) blog.

Community
  • 1
  • 1
toniedzwiedz
  • 17,895
  • 9
  • 86
  • 131
  • My interpretation of the question is different from yours (though both are guesses) - I suspect he wants to know whether the subtrees rooted at the selected nodes overlap, not whether the node-sets themselves are disjoint. – Michael Kay Jan 24 '13 at 23:26
0

There is no real chance to calculate overlap for all possible input, but you can do it easily for a given one.

Overlap on a given input

/nodeA/nodeB/nodeC//*[. = /nodeA/modeB/nodeD//*]

will find all nodes in the result set of the left query which are also contained in the right query. This mimics intersect which @Tom proposed for XPath 1.0. If it returns an empty set, the query does not overlap for this input, if there is a result, they overlap.

Why it's hard to find overlap on arbitrary input

Think of two queries like

  • //*[@id] (return als nodes which have an "id" attribute)
  • //someNode (return all <someNode/> elements)

If there is an element <someNode id="..."/>, there will be overlap, otherwise not. Solving this for arbitrary XPath expressions on arbitrary input will get really hard.

Jens Erat
  • 37,523
  • 16
  • 80
  • 96
0

For two XPath expressions use:

not(Expr1[count(.|Expr2) = count(Expr2)])

If this is evaluated on every pair of expressions you have, and the answer is always true(), then and only then no two expressions select the same node.

If you need to find whether there is a node that is selected by all expressions:

not((Expr1 | Expr2 ... | ExprN)
      [count(.|Expr1) = count(Expr1)
     and
       count(.|Expr2) = count(Expr2)
     .  .  .  .  .  .  .  .  .  .  .  .
     and
       count(.|ExprN) = count(ExprN)
      ]
    )

is true() if there isn't such a node.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • My interpretation of the question is different from yours (though both are guesses) - I suspect he wants to know whether the subtrees rooted at the selected nodes overlap, not whether the node-sets themselves are disjoint. – Michael Kay Jan 24 '13 at 23:26
  • @MichaelKay, I believe my interpretation is likely to be correct -- because the OP says "In case of this pair of arguments, it would return true as the set of elements selected by the expression string1 overlaps **the result of evaluating string2**" – Dimitre Novatchev Jan 24 '13 at 23:30
  • Yes, but his example suggests that he considers a/b to overlap a/b/c, where the node-sets are disjoint but their subtrees aren't. – Michael Kay Jan 25 '13 at 08:52