9

I'm new to SPARQL and I'm trying to create a property path query that will spit out each intermediate step along the path. So far I have this:

select ?object
where {
  <subjectURI> <isRelatedTo>+ ?object .
}

This gives me a list of all the relations to my subject URI throughout the path, no matter how distantly the relation (correct me if I'm wrong so far).

But, I'd like to see how the relations are organized. Something like:

<subjectURI> <isRelatedTo> <object1>
<object1> <isRelatedTo> <object2>
<object2> <isRelatedTo> <object3>

and so on...Is this possible?

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
bdkauff
  • 225
  • 3
  • 13

2 Answers2

10

While there are some limitations in what property paths can do, depending on your exact requirements, you may be able to get what you need here. Consider this data:

@prefix : <urn:ex:>.

:a :relatedTo :b .
:b :relatedTo :c .
:c :relatedTo :d .

:a :relatedTo :e .
:e :relatedTo :f .
:f :relatedTo :g .

:h :relatedTo :i .
:i :relatedTo :j .
:j :relatedTo :k .
:k :relatedTo :l .

in which there are three :relatedTo paths:

a --> b --> c --> d
a --> e --> f --> g
h --> i --> j --> k --> l

I realize that in your case, you had a specific subject, but we can generalize a little bit, and ask for each edge in each of these paths with a query like this:

prefix : <urn:ex:>
select * where {
  # start a path
  ?begin :relatedTo* ?midI .
  FILTER NOT EXISTS { [] :relatedTo ?begin }

  # grab next edge
  ?midI :relatedTo ?midJ .

  # get to the end of the path.
  ?midJ :relatedTo* ?end .
  FILTER NOT EXISTS { ?end :relatedTo [] }
}
order by ?start ?end

$ arq --data data.n3 --query query.sparql
-----------------------------
| begin | midI | midJ | end |
=============================
| :a    | :a   | :b   | :d  |
| :a    | :b   | :c   | :d  |
| :a    | :c   | :d   | :d  |
| :a    | :a   | :e   | :g  |
| :a    | :e   | :f   | :g  |
| :a    | :f   | :g   | :g  |
| :h    | :h   | :i   | :l  |
| :h    | :i   | :j   | :l  |
| :h    | :j   | :k   | :l  |
| :h    | :k   | :l   | :l  |
-----------------------------

which shows each edge of each :relatedTo path. You could make the output a little bit prettier, too:

prefix : <urn:ex:>
select (concat(str(?begin),"--",str(?end)) as ?path) ?midI ?midJ where {
  # start a path
  ?begin :relatedTo* ?midI .
  FILTER NOT EXISTS { [] :relatedTo ?begin }

  # grab next edge
  ?midI :relatedTo ?midJ .

  # get to the end of the path.
  ?midJ :relatedTo* ?end .
  FILTER NOT EXISTS { ?end :relatedTo [] }
}
order by ?path

$ arq --data data.n3 --query query.sparql
--------------------------------------
| path                 | midI | midJ |
======================================
| "urn:ex:a--urn:ex:d" | :a   | :b   |
| "urn:ex:a--urn:ex:d" | :b   | :c   |
| "urn:ex:a--urn:ex:d" | :c   | :d   |
| "urn:ex:a--urn:ex:g" | :a   | :e   |
| "urn:ex:a--urn:ex:g" | :e   | :f   |
| "urn:ex:a--urn:ex:g" | :f   | :g   |
| "urn:ex:h--urn:ex:l" | :h   | :i   |
| "urn:ex:h--urn:ex:l" | :i   | :j   |
| "urn:ex:h--urn:ex:l" | :j   | :k   |
| "urn:ex:h--urn:ex:l" | :k   | :l   |
--------------------------------------

This same approach would let you do some interesting things like finding out how far separated certain nodes are:

prefix : <urn:ex:>
select ?begin ?end (count(*) as ?length) where {
  # start a path
  ?begin :relatedTo* ?midI .
  FILTER NOT EXISTS { [] :relatedTo ?begin }

  # grab next edge
  ?midI :relatedTo ?midJ .

  # get to the end of the path.
  ?midJ :relatedTo* ?end .
  FILTER NOT EXISTS { ?end :relatedTo [] }
}
group by ?begin ?end 

------------------------
| begin | end | length |
========================
| :a    | :g  | 3      |
| :a    | :d  | 3      |
| :h    | :l  | 4      |
------------------------

In the data I provided above, the paths happen to be in alphabetical order and so the sorting produces the edges in the right order. However, even if the edge nodes aren't in alphabetical, we can still print them in order by computing their position in the list. This query:

prefix : <urn:ex:>
select ?begin ?midI ?midJ (count(?counter) as ?position) ?end where {
  ?begin :relatedTo* ?counter .
  ?counter :relatedTo* ?midI .
  FILTER NOT EXISTS { [] :relatedTo ?begin }

  ?midI :relatedTo ?midJ .

  ?midJ :relatedTo* ?end .
  FILTER NOT EXISTS { ?end :relatedTo [] }
}
group by ?begin ?end ?midI ?midJ 

----------------------------------
| begin | midI | midJ | .1 | end |
==================================
| :a    | :a   | :b   | 1  | :d  |
| :a    | :b   | :c   | 2  | :d  |
| :a    | :c   | :d   | 3  | :d  |
| :a    | :a   | :e   | 1  | :g  |
| :a    | :e   | :f   | 2  | :g  |
| :a    | :f   | :g   | 3  | :g  |
| :h    | :h   | :i   | 1  | :l  |
| :h    | :i   | :j   | 2  | :l  |
| :h    | :j   | :k   | 3  | :l  |
| :h    | :k   | :l   | 4  | :l  |
----------------------------------

We don't necessary need to see that count, but you could, instead of selecting the position, you could use it as a sorting condition:

prefix : <urn:ex:>
select ?begin ?midI ?midJ ?end
 where {
  ?begin :relatedTo* ?counter .
  ?counter :relatedTo* ?midI .
  FILTER NOT EXISTS { [] :relatedTo ?begin }

  ?midI :relatedTo ?midJ .

  ?midJ :relatedTo* ?end .
  FILTER NOT EXISTS { ?end :relatedTo [] }
}
group by ?begin ?end ?midI ?midJ 
order by ?begin ?end count(?counter)

and be guaranteed to get your edges in sequence.

UninformedUser
  • 8,397
  • 1
  • 14
  • 23
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • Thank you for the response. What you suggest works up to a point, but as I add more than a few steps, the query responds with nothing. According to this query: `select ?x (COUNT(?z) AS ?linkTotal) where { ?x:relatedTo+ ?z . } group by ?x HAVING (COUNT(?x) > 1)` I get a max value of 12. Doesn't this mean that the longest shortest path is 12 steps, and therefore I should be able to add 12 (err, 10?) intermediate steps to your query and still get one matching case? – bdkauff Aug 08 '13 at 19:07
  • @user2350906 Without seeing your data, it's hard to tell what the query will return. I do notice though, that you're using `+`, which I didn't use in any of my queries. `+` means one or more, whereas `*` is 0 or more. – Joshua Taylor Aug 08 '13 at 19:32
  • @user2350906 Can you elaborate on this any more? The difference between `+` is significant, and if you're grouping by `?x`, should that `HAVING` clause be `HAVING (COUNT(?z) > 1)` rather than `HAVING COUNT(?x) > 1)` (i.e., count `?z`, not `?y`)? – Joshua Taylor Aug 09 '13 at 13:11
  • @JoshuaTaylor Thanks for the great answer. I have one problem similar to this one. In my case there are multiple paths between two nodes. I want to get all the paths from start to end node. Please help. – Dixit Singla Jun 25 '18 at 09:23
  • @DixitSingla I'm not sure that you can do that conveniently in SPARQL. You can get all the *edges* of all paths between the two nodes, but you'd still have to a extract the individual *paths* afterward. – Joshua Taylor Jun 25 '18 at 16:14
7

No, this is a limitation of the design of property paths.

Paths may either be used to compact more complex query patterns or they may be used to test for arbitrary length paths as in your example.

The former may be converted into a form that gives you the intermediate steps e.g.

SELECT * WHERE
{
  ?s <http://predicate>/<http://predicate> ?o
}

Can be converted to the following:

SELECT * WHERE
{
  ?s <http://predicate> ?intermediate .
  ?intermediate <http://predicate> ?o .
}

Unfortunately the same cannot be done for arbitrary length paths. However if you know what the upper bound on paths are you can rewrite your query like so:

SELECT *
WHERE
{
  {
    ?s <http://predicate> ?step1 .
    ?step1 <http://predicate> ?o .
  }
  UNION
  {
    ?s <http://predicate> ?step1 .
    ?step1 <http://predicate> ?step2 .
    ?step2 <http://predicate> ?o .
  }
  # Add additional UNION for each length of path you want up to your upper bound
}

Though as you can immediately see this makes things very verbose.

RobV
  • 28,022
  • 11
  • 77
  • 119
  • Thanks! I think I recall reading about this limitation. The solution is verbose, but probably less so than trying to reconstruct the hops post-query. – bdkauff Aug 02 '13 at 21:10
  • 1
    @user2350906 While there are some limitations to what you can do with property paths, I think you can get the information that you're looking for from a query that uses property paths, and I've described it in [an answer](http://stackoverflow.com/a/18032019/1281433). – Joshua Taylor Aug 03 '13 at 10:50