1

I have a task. I must find shortest path between two nodes in XML using BFS. The XML is 'Romeo & Juliet' (you can find it here: enter link description here).

I must find shortest paths between Juliet and others speakers. Nodes in graph should be speakers, edges - speech in which two speakers appear together. For example: if Juliet talk with X, distance will be 1. But if Juliet talk with X (but not with Y) and X talk with Y, distance is 2.

The result should looks like that (it's only example):

<speaker name="Romeo"><distance>1</distance></speaker>
<speaker name="Tybalt"><distance>3</distance></speaker>
<speaker name="Benvolio"><distance>2</distance></speaker>
...

I prepare same code, but i don't have idea on BFS:

declare function local:BFS($queue,$speakers,$visited,$step)
{
  let $u :=
    for $i in $queue
        for $w in local:neighbour($i)
          (..)

  return ($u)
};

declare function local:neighbour($person as xs:string)
{
let $file := "r_and_j.xml"

let $speakers :=
  distinct-values(doc($file)/PLAY/ACT/SCENE/SPEECH/SPEAKER)

let $sp-query := 

              for $y in $speakers

                where not($person=$y)
                return if(doc($file)/PLAY/ACT/SCENE[SPEECH[SPEAKER=$person] and SPEECH[SPEAKER=$y]])
                       then
                       $y
                       else ()


return $sp-query
};

Function 'neighbour' return nodes from neighbourhood (speakers who talk with person which is get from parametr).

2 Answers2

1

There was a very similar question one month ago. Please read through the answer to understand why your nested for loops will not work the way you think they will.

For your specific task it would be more fitting to view each "round" of the BFS algorithm (processing of all nodes found after a specific number of steps) instead of using a simple queue. I would also extract the neighbors once in the beginning and store them in a more easily accessible form. Then the algorithm could look like this:

declare function local:round($n, $last-round, $out) {
  if(empty($last-round)) then (
    (: last round found no new persons, we are finished :)
    $out
  ) else (
    let $current-round :=
      (: iterate over all persons found in the last round :)
      fold-left($last-round, (), function($current, $speaker) {
        (: iterate over their neighbors :)
        fold-left(
          (: ignore persons found in previous rounds :)
          $persons[@name=$speaker/@name]/neighbor/@name[not(. = $out/@name)],
          $current,
          function($current2, $neighbor) {
            if($current2/speaker[@name = $neighbor]) then (
              (: person was already found this round, ignore :)
              $current2
            ) else (
              (: append newly found person to this round's output :)
              $current2,
              <speaker name="{$neighbor}"><distance>{$n}</distance></speaker>
            )
          })
      })
    return local:round($n + 1, $current-round, ($out, $last-round))
  )
};

The whole query can be seen here. It requires XQuery 3.0 and was tested with the newest version of BaseX.

Community
  • 1
  • 1
Leo Wörteler
  • 4,191
  • 13
  • 10
-1

Your best bet here would probably be to get away from XML and model your entities in a graph database like Jena. From there you can easily establish shortest paths through both number of joins and adding weights to your relationships.

adamretter
  • 3,885
  • 2
  • 23
  • 43