I have been using this post to get the parents or lineage of a single RDF node: SPARQL query to get all parent of a node
This works nicely on my virtuoso server. Sorry, couldn't find a public endpoint containing data with a similar structure.
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix bto: <http://purl.obolibrary.org/obo/>
select (group_concat(distinct ?midlab ; separator = "|") AS ?lineage)
where
{
bto:BTO_0000207 rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf* ?class .
?mid rdfs:label ?midlab .
}
group by ?lineage
order by (count(?mid) as ?ordercount)
giving
+---------------------------------------------------------+
| lineage |
+---------------------------------------------------------+
| bone|cartilage|connective tissue|tibia|tibial cartilage |
+---------------------------------------------------------+
Then I wondered if I could get the lineage for all nodes by changing the select to
select ?s (group_concat(distinct ?midlab ; separator = "|") AS ?lineage)
and the first line in the where statement to
?s rdfs:subClassOf* ?mid .
Those who have more SPARQL experience than I will probably not be surprised that the query timed out.
Is this a reasonable approach? Am I doing something wrong syntactically?
I suspect that the distinct keyword or group clause are bottlenecks, because this only takes a second or two:
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix bto: <http://purl.obolibrary.org/obo/>
select ?s ?midlab
where
{
?s rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf* ?class .
?mid rdfs:label ?midlab .
?s <http://www.geneontology.org/formats/oboInOwl#hasOBONamespace> "BrendaTissueOBO"^^<http://www.w3.org/2001/XMLSchema#string> .
}