Stardog custom aggregate function unavailable in Jena

Question

I've created a custom aggregate function in Stardog that calculates the standard deviation. This works great when you post SPARQL queries to the endpoint or via the query panel in the admin console.

So far, so good, but we're facing a couple of problems. First, of all, when we execute a query like the following, it will execute perfectly via Stardog, but will fail in the SPARQL validator (and with the Jena API as well):

PREFIX  :     <http://our/namespace#>
PREFIX  agg:  <urn:aggregate:>
SELECT (agg:stardog:stdev(?age) AS ?stdLMD) (AVG(?age) AS ?avg)
WHERE {
 ?pat a :Person .
 ?pat :age ?age . 
}

Stardog gives the correct results for standard deviation and average age, but the SPARQL validator throws an exception:

Non-group key variable in SELECT: ?age in expression (?age)

Does Stardog interpret the specification differently or is this a feature I'm unaware of?

Another problem, we're using a custom aggregate function (stdev) in a CONSTRUCT query and again that seems to be working fine via the Stardog API's. Most of our code though is based on Jena, and it doesn't seem to recognize the custom stdev fuction. I guess because this extension is only Stardog related and unavailable for Jena? Let me show an example. ATM, we're executing CONSTRUCT queries via the following Jena code:

final Query dbQuery = QueryFactory.create(query.getContent());
final QueryExecution queryExec = QueryExecutionFactory.create(dbQuery, model);
queryExec.execConstruct(infModel);

As long as we're not using the aggregate function, this works like a charm. As we're constructing triples in multiple named graphs, it's very convenient to have a model available as well (which represents a named graph).

I would like to do something similar with the Stardog java API. I've only gotten as far as:

UpdateQuery dbQuery;
try {
    dbQuery = connection.update(query.getContent());
    dbQuery.execute();
} catch (final StardogException e) {
    LOGGER.error("Cannot execute CONSTRUCT query", e);
}

Problem is that you explicitly need to specify which named graph you want to manipulate in the CONSTRUCT query. There's nothing like a Jena model that represents a part of the database so that we can avoid specifying it in the query. What would be a good approach here?

So my question is twofold: why are queries parsed differently in Stardog and is it possible to have Jena detect the custom Stardog aggregate functions? Thanks!

UPDATE

In the end, what we're trying to accomplish, is to execute a construct query over a given named graph, but write the newly constructed triples to a different graph. In my Jena example, you can see that I'm working with two Jena models to accomplish that. How would you do this with the SNARL API? I've gotten as for as the following code snippet, but this only defines the dataset this query will be executed against, not where the triples will be written to. Any help on this is still appreciated!

UpdateQuery dbQuery;
try {
    dbQuery = connection.update(query.getContent());
    final DatasetImpl ds = new DatasetImpl();
    ds.addNamedGraph(new URIImpl(infDatasource));
    dbQuery.dataset(ds);
    dbQuery.execute();
} catch (final StardogException e) {
    LOGGER.error("Cannot execute CONSTRUCT query", e);
}

stdev can be calculated if you have a square root function: it is the square root of `?calcVAR` where `( ( ( sum(?x*?x)-sum(?x)*sum(?x)/count(?x) ) / (count(?x)-1) ) AS ?calcVAR)` — AndyS, Mar 08 '16 at 17:30
Jena has a square root function so this could have been an option. But for the sake of readability, we might be going for a custom aggregate function. Thanks for the suggestion though! — tstorms, Mar 10 '16 at 09:09

score 2 · Accepted Answer · answered Mar 07 '16 at 17:28

The likely reason for the error

Non-group key variable in SELECT: ?age in expression (?age)

Is that the SPARQL validator, and ARQ, have no idea that agg:stardog:stdev is an aggregate and does not interpret it that way. The syntax is no different than a standard projection expression such as (?x + ?y as ?sum), as AndyS noted.

While the SPARQL spec doesn't quite preclude custom aggregates, they're not accounted for in the grammar itself. Both Stardog and Jena allow custom aggregates, albeit in different ways.

Another problem, we're using a custom aggregate function (stdev) in a CONSTRUCT query and again that seems to be working fine via the Stardog API's. Most of our code though is based on Jena, and it doesn't seem to recognize the custom stdev fuction. I guess because this extension is only Stardog related and unavailable for Jena?

Yes, Jena and Stardog are distinct. Anything custom you've defined in Stardog, such as a custom aggregate, won't available directly in Jena.

You might be constructing the model in such a way that Jena, via ARQ, is the query engine as opposed to Stardog. That would explain why you get exceptions that Jena doesn't know about the custom aggregate you've defined within Stardog.

There's nothing like a Jena model that represents a part of the database so that we can avoid specifying it in the query. What would be a good approach here?

You can specify the active graph of a query programmatically via the SNARL API using dataset

So my question is twofold: why are queries parsed differently in Stardog and is it possible to have Jena detect the custom Stardog aggregate functions? Thanks!

They're parsed differently because there's no standard way of defining a custom aggregate and Stardog & Jena choose to implement it differently. Further, Jena would not be aware of Stardog's custom aggregates and vice versa.

Thanks @Michael for the extensive anwser. Things make sense now. Can you check the update in my question? I'm still not able to reproduce with the SNARL API what I had with the Jena API. — tstorms, Mar 08 '16 at 09:57
@tstorms Given the description of what you're trying to do, I don't see why you're not using a single SPARQL update query rather than a construct & an update. wrt to your specific issue, you probably intended to set the insert graphs for the dataset, not the read/active graph — Michael, Mar 08 '16 at 12:04
@michael just a note that the grammar *does* mention custom aggregates, wherein it says that IRI references can be used. It's not clear to me whether a parser should accept queries with aggregates when it won't know what to do with them, but custom aggregates aren't an extension to the grammar. — Joshua Taylor, Mar 08 '16 at 13:47
@michael specifically, the spec says "Aggregate functions can be one of the built-in keywords for aggregates or a custom aggregate, which is syntactically a function call. Aggregate functions may only be used inSELECT, HAVING andORDER BY clauses." and then that the syntax for function calls is "FunctionCall ::= iri ArgList". — Joshua Taylor, Mar 08 '16 at 13:50
right. the spec references it, but the grammar production `[127] Aggregate` doesn't account for it, it only enumerates the built-ins. so it seems like they're legal, but if you're strict about the grammar, they'd get rejected. — Michael, Mar 09 '16 at 04:17

score 1 · Answer 2 · edited May 23 '17 at 12:23

Non-group key variable in SELECT: ?age in expression (?age)

Does Stardog interpret the specification differently or is this a feature I'm unaware of?

I think that you're reading the spec correctly, and that maybe the validator just doesn't recognize non-built-in aggregates. The spec says:

19.8 Grammar

… Aggregate functions can be one of the built-in keywords for aggregates or a custom aggregate, which is syntactically a function call. Aggregate functions may only be used in SELECT, HAVING and ORDER BY clauses.

As to the construct query:

Another problem, we're using a custom aggregate function (stdev) in a CONSTRUCT query and again that seems to be working fine via the Stardog API's. Most of our code though is based on Jena, and it doesn't seem to recognize the custom stdev function.

You didn't mention how you're using this. To use an aggregate within a construct pattern, you'd need to use a subquery. E.g., something like:

construct { ?s :hasStandardDeviation ?stddev }
where {{
  select ?s (agg:stddev(?value) as ?stddev) {
    ?s :hasSampleValue ?value 
  }
  group by ?s
}}

There are some examples of this in SPARQL functions in CONSTRUCT/WHERE. Of course, if the validator rejects the first, it probably rejects the second as well, but it looks like it should actually be legal. With Jena, you may need to make sure that you select a query language that allows extensions, but since the spec allows the custom functions (when identified by IRIs), I'd think you should be able to use the standard SPARQL 1.1 language. You are using SPARQL 1.1 and not the earlier SPARQL spec, right?

score 0 · Answer 3 · answered Mar 07 '16 at 17:18

Unless a custom aggregate is installed, the parser does not know it's an aggregate. Apache Jena ARQ does not have custom aggregates by default.

An aggregate by URI looks like a plain custom function. So if you have not installed that aggregate, the parser considers it to be a custom function.

The AVG forces an implicit grouping so then the custom function is on a non-group key variable, which is illegal.

Stardog custom aggregate function unavailable in Jena

3 Answers3

19.8 Grammar