9

Is there a good kind of SPARQL query that let's me answer if two given nodes are connected on a single / multiple SPARQL endpoints?

Let's say i want to check if the two nodes

<http://wiktionary.dbpedia.org/resource/dog>

and

<http://dbpedia.org/resource/Dog>

are connected. If yes, i'd be interested in the path.

By guessing i already knew they were connected via the label, so a query like this returns a path of length 3:

SELECT * WHERE {
  <http://wiktionary.dbpedia.org/resource/dog> ?p1 ?n1. 
  # SERVICE <http://dbpedia.org/sparql> {
    <http://dbpedia.org/resource/Dog> ?p2 ?n1 .
  # }
}

try yourself

Now what if i don't have an idea yet and want to do this automatically & for arbitrary length and direction?

I'm aware of SPARQL 1.1's property paths, but they only seem to work for known properties (http://www.w3.org/TR/sparql11-query/#propertypaths):

Variables can not be used as part of the path itself, only the ends.

Also i would want to allow for any path, so the predicates on the path may change.

My current (as i find ridiculous) approach is to query for all possible paths of length k up to a limit of n.

Dumping isn't an option for me as it's billions of triples... I want to use SPARQL!

Jörn Hees
  • 3,338
  • 22
  • 44
  • I'm pretty sure this is a duplicate. SPARQL won't let you determine *the* path, but you can check whether there is *a* path. – Joshua Taylor Jun 18 '15 at 13:10
  • E.g., have a look at [path between two resources](http://stackoverflow.com/q/19587520/1281433), [getting a graph path using SPARQL](http://stackoverflow.com/questions/28897701/getting-a-graph-path-using-sparql), [Querying a Graph path in SPARQL](http://stackoverflow.com/questions/28900290/querying-a-graph-path-in-sparql), [Finding all steps in property path](http://stackoverflow.com/q/18024413/1281433), ... – Joshua Taylor Jun 18 '15 at 13:11
  • **"Variables can not be used as part of the path itself, only the ends."** You can't use variables per se in a path, but you can use a wildcard of the form `(<>|!<>)`. That works, because every URI is either `<>` or isn't, so it matches `<>|!<>`. – Joshua Taylor Jun 18 '15 at 13:14
  • I think [my answer](http://stackoverflow.com/a/18032019/1281433) to **Finding all steps in property path** is pretty close to what you're asking for, once you add the wildcard (and adjust it to go in either direction). – Joshua Taylor Jun 18 '15 at 13:16
  • Do you actually need to know the path, or just *whether* there's a path? – Joshua Taylor Jun 18 '15 at 13:17
  • I'd actually be interested in the path if there is one... and in that answer you know the predicate... i don't! – Jörn Hees Jun 18 '15 at 14:21
  • There's no path return type, so the best you can do is extract the triples that make up the edges on the path. That's something you can do. You'd just do `?start * ?u . ?u ?p ?v . ?v * ?end`. Then every edge in the path is captured by a ?u ?p ?v binding. – Joshua Taylor Jun 18 '15 at 14:34
  • hmm, that would return a couple of `?u ?p ?v` bindings which aren't necessarily parts of the same path (but mixed), no? if i could use a regexp like `{k}` that would even allow me to restrict this down to not "everything" in case it's linked to owl:Thing somehow and everything else as well... without that i can still use this but need to actually insert the wildcard `k` times manually :-/ Then i'm actually back to manually querying all paths of length `k`... – Jörn Hees Jun 18 '15 at 15:00
  • SPARQL doesn't provide any way to distinguish between multiple paths, so yes, you'd get ?u ?p ?v matches for different paths. – Joshua Taylor Jun 18 '15 at 15:10

1 Answers1

17

While you can't use variables in property paths, you can use a wildcard by taking advantage of the fact that for any URI, every property either is that property or isn't. E.g., (<>|!<>) matches any property, since every property either is <> or isn't. You can make a wildcard that goes in either direction by alternating that with itself in the other direction: (<>|!<>)|^(<>|!<>). That means that there's a path, with properties going in either direction, between two nodes ?u and ?v when

?u ((<>|!<>)|^(<>|!<>))* ?v

For instance, the following query should return true (indicating that there is a path):

ASK {
  <http://wiktionary.dbpedia.org/resource/dog> ((<>|!<>)|^(<>|!<>))* <http://dbpedia.org/resource/Dog> 
}

Now, to actually get the links of a path between between two nodes, you can do (letting <wildcard> stand for the nasty looking wildcard):

?start <wildcard>* ?u .
?u ?p ?v .
?v <wildcard>* ?end .

Then ?u, ?p, and ?v give you all the edges on the path. Note that if there are multiple paths, you'll be getting all the edges from all the paths. Since your wildcards go in either direction, you can actually get to anything reachable from the ?start or ?end, so you really should consider restricting the wildcard somehow.

On the endpoint that you linked to, it doesn't, but that appears to be an issue with Virtuoso's implementation of property paths, rather than a problem with the actual query.

Do note that this will be trivially satisfied in many cases if you have any kind of inference happening. E.g., if you're using OWL, then every individual is an instance of owl:Thing, so there'd always be a path of the form:

        ?u &rightarrow;rdf:type owl:Thing &leftarrow;rdf:type ?v

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • upvote for the `((<>|!<>)|^(<>|!<>))*` trick ;), but yes, i'd want to know the path as described above (so i could for example exclude those via owl:Thing) – Jörn Hees Jun 18 '15 at 14:51
  • @JörnHees I've updated my answer, but note that if the wildcard can match properties in either direction, then you could always go "away" from start and then come back through and go to end. So you'll be able to get lots of triples that aren't on an "obvious" path. You'll probably want to make the wildcards directional. – Joshua Taylor Jun 18 '15 at 14:58
  • yes, i thought about the back and forth as well. Your `?u ?p ?v` actually makes use of it (otherwise you don't get ?v ?p ?u edges). Also the duplication of the edges is easy to get rid of with a distinct... ATM i'm more worried about the connected super-component via owl:Thing... The more i think about this, the more the problem i'm trying to solve implies an upper bound for the path length to not return "everything". I could step through them like `?start (/...) ?u . ?u ?p ?v . ?v (/...) ?end`, with k occurrences of + the same for `?v ?p ?u`. – Jörn Hees Jun 18 '15 at 15:11
  • @JörnHees If you can narrow down the problem specification, you might be able to get a better solution. E.g., if you want to exclude rdf:type links, you can make the wildcard !(rdf:type|!rdf:type). In any case, the more you can restrict the paths, the better off you'll be, I expect. – Joshua Taylor Jun 18 '15 at 15:14
  • yupp, i'll just accept this as it already gave me "``" as a pretty cool tool ;) – Jörn Hees Jun 18 '15 at 15:18
  • Not sure why, but on GraphDB the wildcard ((<>|!<>)|^(<>|!<>))* generates an error: "MALFORMED QUERY: Not a valid (absolute) URI: " – magicrebirth Aug 17 '16 at 22:08
  • @magicrebirth Well, `<>` isn't an absolute URI, though I'd have expected it to get resolved to one. But you can use any URI in its place. E.g., `ex:foo|!ex:foo`. – Joshua Taylor Aug 18 '16 at 02:44
  • @JoshuaTaylor, i'm using part of the solution which isn't working probably because i'm wrongly using it, can I get a help, question is at http://stackoverflow.com/questions/43317635/sparql-path-between-two-nodes – Noor Apr 10 '17 at 07:58