Selecting connected vertices on a dependency tree and ordering them by depth (SPARQL)

Question

I’m using SPARQL for RDF and am having trouble coming up with a query that will allow me to select a set of vertices on a tree, and have all connected (direct and transitive) vertices to those selected be returned, and in the order of depth that they exist on the dependency tree (deepest vertices first).

Here is a visual representation of the graph (tree) I’m working with, which contains nodes of RDF type :dependency. In this tree, dependencies have outgoing edges to each dependency that they require, and this is represented on the graph using a :requires edge.

Here is the data that creates this graph:

INSERT DATA {
   :A             rdf:type   :dependency .
   :B             rdf:type   :dependency .
   :C             rdf:type   :dependency .
   :D             rdf:type   :dependency .
   :E             rdf:type   :dependency .
   :F             rdf:type   :dependency .
   :G             rdf:type   :dependency .
   :H             rdf:type   :dependency .
   :I             rdf:type   :dependency .
   :J             rdf:type   :dependency .
   :K             rdf:type   :dependency .

   :A             :requires  :B .
   :A             :requires  :C .
   :A             :requires  :I .
   :B             :requires  :J .
   :B             :requires  :E .
   :C             :requires  :D .
   :E             :requires  :F .
   :E             :requires  :G .
   :D             :requires  :G . 
   :H             :requires  :I .     
   :H             :requires  :K .
}

I would like to form a query that will allow me to select a set of dependencies, and return all dependencies of that selection in an order in which the lowest :required dependencies are returned first, such that if "Dependency X" :requires "Dependency Y", then Dependency X must come after Dependency Y in the result set.

Essentially, I want to ask the graph:

"Give me all vertices that are required by this set of vertices, and return them in the order so that the lowest level dependencies are prioritized first on the result set"

For example, in the case of specifying this selection on the above graph: [B, H]

A valid result set would be: [G, F, E, J, B, I, K, H]

(Since G and F are the lowest dependencies of B, followed by E, J and so on…)

The following result set would be also considered valid: [J, F, G, E, B, I, K, H]

since it does not matter whether J is returned before or after F, G, and H as it doesn't depend on them (all that matters is it is returned before B, since B requires all of them)

So far, I have attempted this query for the selection [B, H], which is able to return all required dependencies of B and H, but they are not returned in the order of lowest dependency priority.

SELECT ?requires {
    ?dependency a :dependency .
    FILTER (?dependency IN (:B, :H))
    ?dependency :requires+ ?requires .
}

which returns the result set in the incorrect order: [E, F, G, J, I, K]

Anyone have any ideas how I could construct a query that essentially performs a depth-first search ordering of the vertices?

honestly, SPARQL is not the appropriate language for this. It's basically a pattern matching query, not a graph traversal query. So, in my opinion, graph traversal languages like Cypher, Gremlin, etc. would be more suitable. I also thing that you can't do anything of their features in SPARQL, especially not in reasonable time. — UninformedUser, Apr 02 '21 at 04:40
long story short, there have been answers here that might come close to what you want, or at least can might be reused: https://stackoverflow.com/a/18991480/4744359 and https://stackoverflow.com/a/18032019/4744359 - so basically you would have to work with subqueries — UninformedUser, Apr 02 '21 at 04:42
two more questions: 1) why is `D` in the expected result of your example? 2) what speaks against getting all edges "below" the given nodes, then doing the DFS on the client side? I mean, with code it's trivial, with SPARQL not (or maybe even impossible) — UninformedUser, Apr 02 '21 at 09:22
@UninformedUser Thanks for your responses! I think I may actually be able to reuse one of those answers, particularly the [second one](https://stackoverflow.com/questions/18024413/finding-all-steps-in-property-path/18032019#18032019) you shared seems to be close enough to what I want (with a slight modification that I’ll include as an answer to this question). — Kevin Andres, Apr 02 '21 at 18:57
For your questions: 1) I just realized this was a mistake on my part as the graph data I originally included has an edge for `:B :requires :D ` however I did not reflect that in the visual representation. I’m updating the graph data and expected results so that D is not included . — Kevin Andres, Apr 02 '21 at 18:57
2) I have nothing against using a combination of code and SPARQL for solving this. I actually wasn’t even sure whether such a query would be possible with SPARQL alone. When you say “getting all edges ‘below’ the given nodes”, do you mean similarly to how it’s done in [this answer](https://stackoverflow.com/questions/18024413/finding-all-steps-in-property-path/18032019#18032019) using property paths to get all edges from the specified starting nodes until the bottom is reached, while maintaining a counter to determine the depth of each edge? — Kevin Andres, Apr 02 '21 at 18:58
I am posting a separate answer to this question with a query that I think may work for me. Would love to get your thoughts on it, as it is very similar to the answer you shared. — Kevin Andres, Apr 02 '21 at 18:58

score 1 · Accepted Answer · answered Apr 02 '21 at 19:00

Thanks to this answer shared by @UninformedUser, I can get a result set that is close enough to what I’m looking for. Basically we can use property paths to determine all :requires edges that are underneath a set of specified vertices, while keeping a counter that will return the depth of each of those edges:

If we want to determine the order for this selection [B, H], then we can use this query, which is a slightly modified version of the query in the above answer, to allow for the selection of the beginning vertices instead of selecting all paths from the top:

select ?begin ?midI ?midJ (count(?counter) as ?position) ?end where {
  ?begin a :dependency .
  FILTER (?begin IN (:B, :H))
  ?begin :requires* ?counter .
  ?counter :requires* ?midI .

  ?midI :requires ?midJ .

  ?midJ :requires* ?end .
  FILTER NOT EXISTS { ?end :requires [] }
}
group by ?begin ?end ?midI ?midJ 
order by DESC(?position) ?begin ?end

This returns the following result set:

----------------------------------------
| begin | midI | midJ | position | end |
========================================
| :B    | :E   | :F   | 2        | :F  |
| :B    | :E   | :G   | 2        | :G  |
| :B    | :B   | :E   | 1        | :F  |
| :B    | :B   | :E   | 1        | :G  |
| :B    | :B   | :J   | 1        | :J  |
| :H    | :H   | :I   | 1        | :I  |
| :H    | :H   | :K   | 1        | :K  |
----------------------------------------

This is sufficient to allow me to perform the remainder of the sorting with code on the client side, where for each unique value in the begin column, collect all values of midJ (starting from top to bottom), until all rows of the same begin value are exhausted, then finally include the value in the begin column. When doing that for the above result set, we get a valid answer: [F, G, E, B, J, I, K, H]

Selecting connected vertices on a dependency tree and ordering them by depth (SPARQL)

1 Answers1