4

I would like to know if there are some functions to manipulate RDF Collections in SPARQL.

A motivating problem is the following.

Suppose you have:

@prefix : <http://example.org#> .
:x1 :value 3 .
:x2 :value 5 .
:x3 :value 6 .
:x4 :value 8 .

:list :values (:x1 :x2 :x3 :x4) .

And you want to calculate the following formula: ((Xn - Xn-1) + ... (X2 - X1)) / (N - 1)

Is there some general way to calculate it?

Up until now, I was only able to calculate it for a fixed set of values. For example, for 4 values, I can use the following query:

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ?r { 
 ?list :values ?ls .
 ?ls rdf:first ?x1 .
 ?ls rdf:rest/rdf:first ?x2 .
 ?ls rdf:rest/rdf:rest/rdf:first ?x3 .
 ?ls rdf:rest/rdf:rest/rdf:rest/rdf:first ?x4 .
 ?x1 :value ?v1 .
 ?x2 :value ?v2 .
 ?x3 :value ?v3 .
 ?x4 :value ?v4 .
 BIND ( ((?v4 - ?v3) + (?v3 - ?v2) + (?v2 - ?v1)) / 3 as ?r)
}

What I would like is some way to access the Nth value and to define some kind of recursive function to calculate that expression. I think it is not possible, but maybe, someone has a nice solution.

Labra
  • 1,412
  • 1
  • 13
  • 33
  • In the particular example you've given, the addition and subtraction cancel out, so the numerator is just `?v4 - ?v1`. Is that what you intended? – Joshua Taylor Jun 26 '13 at 11:59
  • No, it was just an example formula simplified...the real formula is more complex. it calculates the mean of ?v_N / ?v_(N-1). Thanks for the comment anyway :) – Labra Jun 27 '13 at 13:14
  • OK, then, I think that the actual formula should still work using the answer I posted. Have you tried adapting the answer to your actual case yet? – Joshua Taylor Jun 27 '13 at 13:34
  • Yes, I tried, but it does not work. Notice that I have to take the pairs (xN, xN-1), (xN-1,xN-2),... – Labra Jun 27 '13 at 15:15
  • Yes, that's what's discussed in the **Update** section of my answer, particularly the first SPARQL query in that section. – Joshua Taylor Jun 27 '13 at 15:17

2 Answers2

7

No built-ins that make formulas easier…

SPARQL does include some mathematical functions for arithmetic and aggregate computations. However, I don't know of any particularly convenient ways of concisely representing mathematical expressions in SPARQL. I've been looking at a paper lately that discusses an ontology for representing mathematical objects like expressions and definitions. They implemented a system to evalute these, but I don't think it used SPARQL (or at least, it wasn't just a simple extension of SPARQL).

Wenzel, Ken, and Heiner Reinhardt. "Mathematical Computations for Linked Data Applications with OpenMath." Joint Proceedings of the 24th Workshop on OpenMath and the 7th Workshop on Mathematical User Interfaces (MathUI). 2012.

…but we can still do this case.

That said, this particular case isn't too hard to do, since it's not too hard to work with RDF lists in SPARQL, and SPARQL includes the mathematical functions needed for this expression. First, a bit about RDF list representation, that will make the solution easier to understand. (If you're already familiar with this, you can skip the next paragraph or two.)

RDF lists are linked lists, and each list is related to it's first element by the rdf:first property, and to the rest of the list by rdf:rest. So the convenient notation (:x1 :x2 :x3 :x4) is actually shorthand for:

_:l1 rdf:first :x1 ; rdf:rest _:l2 .
_:l2 rdf:first :x2 ; rdf:rest _:l3 .
_:l3 rdf:first :x3 ; rdf:rest _:l4 .
_:l3 rdf:first :x4 ; rdf:rest rdf:nil .

Representing blank nodes with [], we can make this a bit clearer:

[ rdf:first :x1 ;
  rdf:rest [ rdf:first :x2 ;
             rdf:rest [ rdf:first :x3 ;
                        rdf:rest [ rdf:first :x4 ;
                                   rdf:rest rdf:nil ]]]]

Once the head of the list has been identified, that is, the element with rdf:first :x1, then any list l reachable from it by an even number repetitions (including 0) of rdf:rest/rdf:rest is a list whose rdf:first is an odd numbered element of the list (since you started indexing at 1). Starting at l and going forward one rdf:rest, we're at an l' whose rdf:first is an even numbered element of the list.

Since SPARQL 1.1 property paths let us write (rdf:rest/rdf:rest)* to denote any even numbered repetitions of rdf:rest, we can write up the following query that binds the :value of odd numbered elements of ?n and the value of the following even numbered elements to ?nPlusOne. The math in the SELECT form is straightforward, although to get N-1, we actually use 2*COUNT(*)-1, because the number of rows (each of which binds elements n and n+1) is N/2.

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ( SUM(?nPlusOne-?n)/(2*COUNT(*)-1) as ?result) {
 ?list :values [ (rdf:rest/rdf:rest)* [ rdf:first [ :value ?n ] ; 
                                        rdf:rest  [ rdf:first [ :value ?nPlusOne ]]]] .
}

Results (using Jena's command line ARQ):

$ arq --query query.sparql --data data.n3 
------------------------------
| result                     |
==============================
| 1.333333333333333333333333 |
------------------------------

which is what is expected since

 (5 - 3) + (8 - 6)     2 + 2     4      _ 
------------------- = ------- = --- = 1.3
      (4 - 1)            3       3

Update

I just realized that what is implemented above was based on my comment on the question about whether the summation was correct, because it simplified very easily. That is, the above implements

(x2 - x1) + (x4 - x3) + ... + (xN - xN-1) / (N - 1)

whereas the original question asked for

(x2 - x1) + (x3 - x2) + … + (xN-1 - xN-2) + (xN - xN-1) / (N - 1)

The original is even simpler, since the pairs are identified by each rdf:rest of the original list, not just even numbers of repetitions. Using the same approach as above, this query can be represented by:

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT ( SUM(?nPlusOne-?n)/COUNT(*) as ?result) {
 ?list :values [ rdf:rest* [ rdf:first [ :value ?n ] ; 
                             rdf:rest  [ rdf:first [ :value ?nPlusOne ]]]] .
}

Results:

$ arq --query query.sparql --data data.n3 
------------------------------
| result                     |
==============================
| 1.666666666666666666666666 |
------------------------------

Of course, since the expression can be simplified to

xN - x1 / (N - 1)

we can also just use a query which binds ?x1 to the first element of the list, ?xn to the last element, and ?xi to each element of the list (so that COUNT(?xi) (and also COUNT(*)) is the number of items in the list):

prefix : <http://example.org#> 
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

SELECT (((?xn-?x1)/(COUNT(?xi)-1)) as ?result) WHERE {
 ?list :values [ rdf:rest*/rdf:first [ :value ?xi ] ;
                 rdf:first [ :value ?x1 ] ;
                 rdf:rest* [ rdf:first [ :value ?xn ] ; 
                             rdf:rest  rdf:nil ]] .
}
GROUP BY ?x1 ?xn

Results:

$ arq --query query.sparql --data data.n3 
------------------------------
| result                     |
==============================
| 1.666666666666666666666666 |
------------------------------
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • 1
    What a nice answer! I'm starting to follow SWI-Prolog path to RDF, and your insight seems valuable. – CapelliC Jun 26 '13 at 13:05
  • @CapelliC If you're interested in Prolog-like approaches to working with RDF, [Apache Jena's inference support](http://jena.apache.org/documentation/inference/) includes forward and backward chaining reasoners. The backward chaining reasoner is a “tabled datalog engine”. The forward chaining reasoner can also add new backward chaining rules to the engine. You may find it interesting. – Joshua Taylor Jun 26 '13 at 13:09
  • Thanks, Joshua. SWI-Prolog has a peculiar way of doing RDF. It's more an alternative to Jena, than a plugin. For instance, full SPARQL 1.1 is written in Prolog, available 'embedded'. – CapelliC Jun 26 '13 at 13:22
  • Very nice answer. However, it is not really what I wanted. I had simplified my motivating formula, but in fact, my motivating formula is: `(xN / xN-1 + xN-1 / xN-2 + ... + x2 / x1) / N - 1` In that case, your approach does not seem to solve it because I need to take the pairs (xN,xN-1) and divide them... Any other idea? – Labra Jun 27 '13 at 13:49
  • @Labra If you need to divide a pair instead of subtract, doesn't that mean you just change the `?nPlusOne-?n` in the SELECT projection to `?nPlusOne/?n`? The particular arithmetic expression wasn't really the concern here; SPARQL already provides the arithmetic operations, and the question was about getting pairs out the list, which I think this answer demonstrates how to do. – Joshua Taylor Jun 27 '13 at 13:59
  • Yes, I tried it and it works. Sorry, I was just misleaded thinking that it was adding all the odd numbers and substracting them from the even one. As I said previously, very nice answer! :) – Labra Jun 27 '13 at 15:30
0

You may also have a look at alternative ways of describing/representing lists in RDF, e.g., with help of the Ordered List Ontology. I think with this model you can more easily query what you want ;)

zazi
  • 686
  • 1
  • 7
  • 19