I'm trying to extract all parents of a each given GO Id (a node) using EBI-RDF sparql endpoint, I was based on this two similar questions to formulate the query, here're two examples illustrating the problem:
Example 1 (Link to the structure):
biological_process (GO:0008150)
|__ metabolic process (GO:0008152)
|__ methylation (GO:0032259)
In this example, using the following query:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX obo: <http://purl.obolibrary.org/obo/>
SELECT (count(?mid) as ?depth)
(group_concat(distinct ?midId ; separator = " / ") AS ?treePath)
FROM <http://rdf.ebi.ac.uk/dataset/go>
WHERE {
obo:GO_0032259 rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf* ?class .
?mid <http://www.geneontology.org/formats/oboInOwl#id> ?midId.
}
GROUP BY ?treePath
ORDER BY ?depth
I got the desired results without problems:
c | treePath
--|-------------------------------------
6 | GO:0008150 / GO:0008152 / GO:0032259
But when the term exists in multiple branches (e.g GO:0007267
) as in the case below, the previous approach didn't work:
Example 2 (Link to the structure)
biological_process (GO:0008150)
|__ cellular_process (GO:0009987)
| |__ cell communication (GO:0007154)
| |__ cell-cell signaling (GO:0007267)
|
|__ signaling (GO:0023052)
|__ cell-cell signaling (GO:0007267)
The result:
c | treePath
--|---------------------------------------------------------------
15| GO:0007154 / GO:0007267 / GO:0008150 / GO:0009987 / GO:0023052
What I wanted to get is the following:
GO:0008150 / GO:0009987 / GO:0007154 / GO:0007267
GO:0008150 / GO:0023052 / GO:0007267
What I understood is that under the hood I'm calculating the depth of each level and using it to construct the path, this works fine when we have an element that belongs only to one branch.
SELECT (count(?mid) as ?depth) ?midId
FROM <http://rdf.ebi.ac.uk/dataset/go>
WHERE {
obo:GO_0032259 rdfs:subClassOf* ?mid .
?mid rdfs:subClassOf* ?class .
?mid <http://www.geneontology.org/formats/oboInOwl#id> ?midId.
}
GROUP BY ?midId
ORDER BY ?depth
The result:
depth | midId
------|------------
1 | GO:0008150
2 | GO:0008152
3 | GO:0032259
In the second example, things are missed up and I didn't get why, in any ways I'm sure that part of the problem are terms that have the same depth/level, but I don't know how can I solve this.
depth | midId
------|------------
2 | GO:0008150
2 | GO:0009987
2 | GO:0023052
3 | GO:0007154
6 | GO:0007267