1

I have to do searches in "ordered" xml files where my text to retreive is dispached over several nodes like this.

<root>
    <div id="1">Hello</div>
    <div id="2">Hel</div>
    <div id="3">lo dude</div>   
    <div id="4">H</div>
    <div id="5">el</div>
    <div id="6">lo</div>
</root>

The search has to be done on a concatenated text :

HelloHello dudeHello

But I need to be able to retreive nodes attributes. For instance, for a 'll' search, I wish to get the nodes :

<div id="1">Hello</div>
<div id="2">Hel</div>
<div id="3">lo dude</div>   
<div id="5">el</div>
<div id="6">lo</div>

or at least the ids.

Does someone has an idea how to do this in a XPath, or any other means ?

I think it's a bit challenging, I have no (simple) idea for the moment. Thanks for your help.

EDIT : the text must be concatenated before search is a key information and had to be precised !

  • Looking at given output, I guess your search token is actually `l`. If not, please explain why `@id` 2, 3, 5, 6 are contained for an `ll` search. – Jens Erat Apr 12 '13 at 11:49
  • Ok I have to be more precise : the text must be concatenated before search... I'm going to edit my question. – user2273807 Apr 12 '13 at 12:47
  • Do you need this solved for _all_ search tokens or only this one? A specific solution is quite easy, a general rather complex. What XPath engine do you use, are you bound to it? Would XQuery be fine, too? – Jens Erat Apr 12 '13 at 13:28
  • I'm looking for a solution for all search token ! I'm not bound to any XPath engine or solution. Maybe there is no way to do this without a complex developpement ... – user2273807 Apr 12 '13 at 13:48

2 Answers2

0

Your updates requirements make the problem much more complex, as the "element wrap" can occur at arbitrary points inside the search token and possibly even span multiple elements. I don't think you will be able to write a query in XPath < 3.0 (if you're able to do it only in XPath anyway). I used XQuery for it, which extends XPath. The code is running fine in BaseX, but should also run in all other XQuery engines (maybe requires XQuery 3.0, didn't have a look at that).

The code got rather complex, I think I put enough comments in there to make it comprehensible. It requires nodes to be inside the next element, but with minor adjustments it can also be used to traverse arbitrary XML structures (think of HTML with <span/>s and other markup).

(: functx dependencies :)
declare namespace functx = "http://www.functx.com";
declare function functx:is-node-in-sequence 
  ( $node as node()? ,
    $seq as node()* )  as xs:boolean {

   some $nodeInSeq in $seq satisfies $nodeInSeq is $node
 } ;
declare function functx:distinct-nodes 
  ( $nodes as node()* )  as node()* {

    for $seq in (1 to count($nodes))
    return $nodes[$seq][not(functx:is-node-in-sequence(
                                .,$nodes[position() < $seq]))]
 } ;

declare function local:search( $elements as item()*, $pattern as xs:string) as item()* {
  functx:distinct-nodes(
    for $element in $elements
    return ($element[contains(./text(), $pattern)], local:start-search($element, $pattern))
  )
};

declare function local:start-search( $element as item(), $pattern as xs:string) as item()* {
    let $splits := (
      (: all possible prefixes of search token :)
      for $i in 1 to string-length($pattern) - 1
      (: check whether element text starts with prefix :)
      where ends-with($element/text(), substring($pattern, 1, $i))
      return $i
    )
    (: go on for all matching prefixes :)
    for $split in $splits
    return
      (: recursive call to next element :)
      let $continue := local:continue-search($element/following-sibling::*[1], substring($pattern, $split+1))
      where not(empty($continue))
      return ($element, $continue)
};

declare function local:continue-search( $element as item()*, $pattern as xs:string) as item()* {
  if (empty($element)) then () else
  (: case a) text node contains whole remaining token :)
  if (starts-with($element/text(), $pattern))
  then ($element)
  (: case b) text node is part of token :)
  else if (starts-with($pattern, $element/text()))
  then
    (: recursive call to next element :)
    let $continue := local:continue-search($element/following-sibling::*[1], substring($pattern, 1+string-length($element/text())))
    where not(empty($continue))
    return ($element, $continue)
  (: token not found :)
  else ()
};

let $token := 'll'
return local:search(//div, $token)
Jens Erat
  • 37,523
  • 16
  • 80
  • 96
  • Just saw your answer, thanks a lot ! I'm going to try this right now. Waow ! Seems complex, but the problem is ! – user2273807 Apr 17 '13 at 13:34
0

In XPath 2 you can use tokenize to count how often the searched text occurs and then test for each node, if not including this node in the text, reduces the number of occurrences. If the number is reduced, that node has to be included in the result. That is not so fast through.

Assuming only the text in the direct child nodes matters, like in the example, it looks like this:

for $searched in "ll" 
return //*/ for $matches in count(tokenize(string-join(*, ""), $searched)) - 1
            return *[$matches > count(tokenize(concat(" ",string-join(preceding-sibling::*, "")), $searched)) +
                                count(tokenize(concat(" ",string-join(following-sibling::*, "")), $searched)) - 2]
BeniBela
  • 16,412
  • 4
  • 45
  • 52
  • Thanks for your answer. I'm not sure it can work for any search pattern but I must admit that I'm not used to XPath and I'll have to try and see... – user2273807 Apr 17 '13 at 13:39