I'm using a federated query to retrieve some infos from a remote server, but I don't want to retrieve all the variables (select *) that I'm working on inside the federated query, I want to return just the count variable. How can I do that?
Code:
SERVICE <https://sparql.uniprot.org/sparql/> {
?sub_bp (rdfs:subClassOf|owl:someValuesFrom)* ?bp_iri .
?protein up:classifiedWith ?sub_bp.
?protein up:organism <http://purl.uniprot.org/taxonomy/10090> .
}
If was not a federated query, I would do like this:
SELECT distinct (count(distinct ?protein) as ?count) WHERE {
?sub_bp (rdfs:subClassOf|owl:someValuesFrom)* ?bp_iri .
?protein up:classifiedWith ?sub_bp.
?protein up:organism <http://purl.uniprot.org/taxonomy/10090> .
}
But in the federated query I cannot select variables, so is there a way to do what I want?
** EDIT 1 **
After @TallTed response I notice that I may have skipped some details in order to make the question simple but the details turn out to be important so I will describe the whole situation.
I have a local data set containing triples about biological process and genes. I have to count how many genes are related to each biological process and divide that number by the total number of proteins identified in Uniprot about the same biological process (and its "childrens").
To do this, I first query my local data set counting the genes for each biological process and then I run a federated query to count all the identified proteins in Uniprot of each biological process (and its "childrens").
The full SPARQL code:
PREFIX obo: <http://purl.obolibrary.org/obo/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX up:<http://purl.uniprot.org/core/>
PREFIX owl:<http://www.w3.org/2002/07/owl#>
SELECT DISTINCT ?bp_iri ?bp_count (count(distinct ?protein) as ?bp_total) ((?bp_count / ?bp_total) as ?divided) WHERE {
{
SELECT DISTINCT ?bp_iri (COUNT(?bp_iri) as ?bp_count) WHERE{
?genes_iri a uniprot:Gene .
?genes_iri obo:RO_0000056 ?bp_iri .
}group by ?bp_iri order by DESC(?bp_count)
}
SERVICE silent <https://sparql.uniprot.org/sparql/> {
?sub_bp (rdfs:subClassOf|owl:someValuesFrom)* ?bp_iri .
?protein up:classifiedWith ?sub_bp.
?protein up:organism <http://purl.uniprot.org/taxonomy/10090> .
}
}group by ?bp_iri ?bp_count ?bp_total order by DESC(?divided)
When I run this query using Jena ARQ (a query engine) the variable ?bp_iri
is replaced at the moment of the HTTP request by an specific biological process IRI (one HTTP request for each biological process) as shown in the image below:
Note that in the explain
image, the federated query is selecting everything (*) but the problem is that I don't want to retrieve all these relations that I'm dealing in the federated query, I just want to retrieve the count but the count is a aggragated function that is only allowed to be placed in front of the SELECT
keyword. (I don't want to retrieve all the relations because these query returns A LOT of triples (in order of tens of thousands, sometimes milions) and its not necessary to have them in my computer just to count.)
To solve this, I tried to create a subquery inside the federated query to select only the count (?bp_total
) and not all the triples. Code used:
SERVICE silent <https://sparql.uniprot.org/sparql/> {
{
SELECT (count(distinct ?protein) as ?bp_total) WHERE {
?sub_bp (rdfs:subClassOf|owl:someValuesFrom)* ?bp_iri .
?protein up:classifiedWith ?sub_bp.
?protein up:organism <http://purl.uniprot.org/taxonomy/10090> .
}
}
}
Running the explain
again, I noticed that when I put a subquery inside the federated query, the variable ?bp_iri
is not replaced by the biological process IRI as shown in the image below:
Considering this, how can I retrieve only the count from a federated query?
Sorry about the long post.