SPARQL to get all parents of all nodes

Question

I have been using this post to get the parents or lineage of a single RDF node: SPARQL query to get all parent of a node

This works nicely on my virtuoso server. Sorry, couldn't find a public endpoint containing data with a similar structure.

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix bto: <http://purl.obolibrary.org/obo/>
select (group_concat(distinct ?midlab ; separator = "|") AS ?lineage)
where
{ 
  bto:BTO_0000207 rdfs:subClassOf* ?mid .
  ?mid rdfs:subClassOf* ?class .
  ?mid rdfs:label ?midlab .
}
group by ?lineage
order by (count(?mid) as ?ordercount)

giving

+---------------------------------------------------------+
|                         lineage                         |
+---------------------------------------------------------+
| bone|cartilage|connective tissue|tibia|tibial cartilage |
+---------------------------------------------------------+

Then I wondered if I could get the lineage for all nodes by changing the select to

select ?s (group_concat(distinct ?midlab ; separator = "|") AS ?lineage)

and the first line in the where statement to

?s rdfs:subClassOf* ?mid .

Those who have more SPARQL experience than I will probably not be surprised that the query timed out.

Is this a reasonable approach? Am I doing something wrong syntactically?

I suspect that the distinct keyword or group clause are bottlenecks, because this only takes a second or two:

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix bto: <http://purl.obolibrary.org/obo/>
select ?s ?midlab
where
{ 
  ?s rdfs:subClassOf* ?mid .
  ?mid rdfs:subClassOf* ?class .
  ?mid rdfs:label ?midlab .
  ?s <http://www.geneontology.org/formats/oboInOwl#hasOBONamespace> "BrendaTissueOBO"^^<http://www.w3.org/2001/XMLSchema#string> .
}

score 2 · Accepted Answer · answered Jul 21 '15 at 14:42

2

Your first query isn't legal. You can check at sparql.org's query validator. While you can order by count(?mid), you can't bind the value to a variable and order by it in the same clause. That would give you:

select (group_concat(distinct ?midlab ; separator = "|") AS ?lineage)
where
{ 
  bto:BTO_0000207 rdfs:subClassOf* ?mid .
  ?mid rdfs:subClassOf* ?class .
  ?mid rdfs:label ?midlab .
}
group by ?lineage
order by count(?mid)

Now, that's legal, but it doesn't make quite as much sense. group_concat requires that you have some groups, and that you'll do a concatenation for the values within each group. In the absence of a group by clause, you get an implicit group, so the group_concat without a group by is OK. But you've got a group by ?lineage that doesn't make a whole lot of sense, because ?lineage already only has one value per group (since it's already an aggregate). Better would be to group by ?s, as in the following. This seems more correct, and might not time out:

select ?s (group_concat(distinct ?midlab ; separator = "|") AS ?lineage)
where
{ 
  ?s rdfs:subClassOf* ?mid .
  ?mid rdfs:subClassOf* ?class .
  ?mid rdfs:label ?midlab .
}
group by ?s
order by count(?mid)

answered Jul 21 '15 at 14:42

Joshua Taylor

84,998
9
154
353

This is looking very promising. I added a from clause and changed the order by to "(count(?mid) as ?midcount)" and got a reasonable result. – Mark Miller Jul 21 '15 at 14:53
3

@MarkMiller `order by (... as ...)` **isn't legal**. Virtuoso might accept it (it accepts a number of non-standard syntaxes), but it's not legal SPARQL. If you ever need to run your query against another endpoint, it's very likely that it **won't work**. You can `select (count(?mid) as ?midcount) { ... } order by ?midcount`, and you can `select ... { ... } order by count(?mid)`, but you can't `select ... { ... } order by (count(?mid) as ?midcount)`. The variable binding form `(... as ...)` isn't legal in `order by ...`. (It *is* legal in a `group by`, however.) – Joshua Taylor Jul 21 '15 at 15:28
I really appreciate your feedback and apologize for not studying more carefully. I also use Jena ad am evaluating MarkLogic, so best practices are important to me. "order by count(?mid)" raised "Virtuoso 37000 Error SP030: SPARQL compiler, line 12: syntax error at '(' before '?mid'" Just enclosing it in parentheses without an AS does work and passes the validatior. – Mark Miller Jul 21 '15 at 16:42
@MarkMiller Yes, Virtuoso's quirks are a semi-constant source of frustration. Some of the big publicly accessible endpoints (e.g., DBpedia) use endpoint, so it's where a lot of people start, but end up learning some bad habits. :) It's less common to see *correct* syntax that Virtuoso *doesn't* accept (*incorrect* syntax that *is* accepted is more common), so that **order by** issue is kind of frustrating. I'm glad you found a workaround (with parentheses), though. – Joshua Taylor Jul 22 '15 at 13:17

SPARQL to get all parents of all nodes

1 Answers1

Linked