2

Hi I have the output of results like this :

?name ?o ?x
------------
ABCD  xyz ghh
PQR   xyz hij

how do I combine the columns ?o and ?x into one column called papers? I need the output to be like this :

?name ?papers
--------------
ABCD  (xyz, ghh, hij)

Note that PQR and ABCD are replaced only by ABCD. ABCD and PQR have the same property called mbox_sha1sum

The above two were examples. I need them to be like so:

This is the current sparql query :

PREFIX xmlns: <http://xmlns.com/foaf/0.1/> 
PREFIX ontoware: <http://swrc.ontoware.org/ontology#>
SELECT DISTINCT ?name ?x ?o
WHERE { 
 ?s xmlns:mbox_sha1sum ?hash.
 ?s xmlns:made ?o.
 ?s xmlns:name ?name.
 ?o ontoware:year "2009".
 ?r xmlns:mbox_sha1sum ?hash.
 ?r xmlns:made ?x.
 ?x ontoware:year "2008".
}

I need to basically combine ?o and ?x into one column called ?papers If it helps, ?s and ?r have different IRI

  • possible duplicate of [Aggregating results from SPARQL query](http://stackoverflow.com/questions/18212697/aggregating-results-from-sparql-query) – Joshua Taylor Dec 03 '13 at 13:39
  • 1
    The linked duplicate exactly answers this question. Most of its answer describes how to do the concatenation. The last paragraph addresses the issue of removing duplicate results by using `group_by( distinct ?var ; ... )`. – Joshua Taylor Dec 03 '13 at 14:00

1 Answers1

2

You can do this with a combination of grouping and aggregates like so:

PREFIX xmlns: <http://xmlns.com/foaf/0.1/> 
PREFIX ontoware: <http://swrc.ontoware.org/ontology#>
SELECT (SAMPLE(?name) AS ?GroupName) (GROUP_CONCAT(CONCAT(?x, ", ", ?o) ; SEPARATOR = ", ") AS ?Papers)
WHERE 
{ 
  ?s xmlns:mbox_sha1sum ?hash.
  ?s xmlns:made ?o.
  ?s xmlns:name ?name.
  ?o ontoware:year "2009".
  ?r xmlns:mbox_sha1sum ?hash.
  ?r xmlns:made ?x.
  ?x ontoware:year "2008".
} GROUP BY ?hash

The GROUP BY clause groups results together by the ?hash variable, because you've only grouped by this variable you can't then select ?name directly (because as you've shown there are multiple values for it) so instead you must use SAMPLE(?name) to give you one of the possible names (no guarantees which you get).

Then you can use the GROUP_CONCAT() aggregate which groups togethers all values of the given expression for the group. Since you actually have two values you need to combine you need to use the CONCAT() function as your expression.

Bear in mind that this won't give you precisely what you want rather you'll get something like the following:

?GroupName | ?Papers
--------------------------------
ABCD       | xyz, ghh, xyz, him

Eliminating the duplicate paper entries is potentially possible but likely to make your query much more complicated. It may be easier to eliminate the duplicates by post-processing the ?Papers value in Java.

RobV
  • 28,022
  • 11
  • 77
  • 119
  • 1
    Duplicates are easy to eliminate: `group_concat( distinct ?x; ...)`. – Joshua Taylor Dec 03 '13 at 14:01
  • This somewhat produces the answer I'm looking for. Thankyou @RobV. It is difficult like you said to eliminate duplicates. Using distinct in group_concat produces error. I'm going to wait to see if someone gives me an answer with distinct paper listing or mark this as answer. Thanks againg =) – Prasana Venkat Ramesh Dec 03 '13 at 15:23
  • @PrasanaVenkatRamesh Can you clarify what kind of error you're getting? The answer at http://stackoverflow.com/questions/18212697/aggregating-results-from-sparql-query shows how `group_by(distinct ...)` works… – Joshua Taylor Dec 04 '13 at 03:58
  • @PrasanaVenkatRamesh In re-reading your question, I think I may have misread something the first time… – Joshua Taylor Dec 04 '13 at 03:59