7

I need to find all triples on DBpedia where http://dbpedia.org/resource/Benin is a subject or object. This query gives me the output that I want in a format that works the best for me (just three variables and no blank spaces):

PREFIX : <http://dbpedia.org/resource/>
SELECT * WHERE {
?s ?p ?o
FILTER (?s=:Benin OR ?o=:Benin)
}

I get similar results if I have this query:

PREFIX : <http://dbpedia.org/resource/>
SELECT * WHERE {
{:Benin ?p ?o}
UNION
{?s ?p :Benin}
}

However, the formatting of the latter is off. It first gives me p and o output leaving s blank and then s and p leaving o blank. Also, the first query takes more time to execute. I will be grateful for an explanation of the mechanics of how the two queries work and why there is a difference in the output.

TallTed
  • 9,069
  • 2
  • 22
  • 37
kurious
  • 1,024
  • 10
  • 29

2 Answers2

5

However, the formatting of the latter is off

That's because both queries have different result sets together with SELECT *. The union joins the tuples, but since some tuples are missing parts, you get skewed output.

You can resolve the problem by explicitly listing and selecting the variables:

PREFIX : <http://dbpedia.org/resource/>
SELECT ?s ?p ?o WHERE {
   {
       ?s ?p ?o
       FILTER (?s=:Benin)
   }
   UNION 
   {
       ?s ?p ?o .
       FILTER (?o=:Benin)
   }
}

Note that this is still much faster on dbpedia than the OR filter.

The union will return duplicates when a tuple matches both filter expressions (i.e. :Benin ?p :Benin). SELECT DISTINCT would remedy that at additional cost and since it looks like the problem is non-existent, I omitted it for improved performance.

Also, the first query takes more time to execute.

That's hard to say without the result of an EXPLAIN(), but my first guess would be that the equality filter is using the index, while the OR filter is using a full table scan. Virtuoso does not seem to generate good query plans for nested filters.

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
dhke
  • 15,008
  • 2
  • 39
  • 56
3

Try this --

PREFIX : <http://dbpedia.org/resource/>
DESCRIBE  :Benin

-- or just --

DESCRIBE  <http://dbpedia.org/resource/Benin>

You can get the output in various other serializations, including N-triples.

TallTed
  • 9,069
  • 2
  • 22
  • 37
  • This is beautiful :-) – kurious Feb 09 '16 at 03:58
  • Can you please demonstrate how one can choose the serialization format? – kurious Feb 09 '16 at 14:02
  • Also, this is about DESCRIBE from SPARQL 1.1 official documentation: "The DESCRIBE form returns a single result RDF graph containing RDF data about resources...The description is determined by the query service." Given this, can we assume that DBpedia furnishes all information about a resource through DESCRIBE (especially when the result has more than 2k triples). – kurious Feb 09 '16 at 14:05
  • 2
    The public DBpdia endpoint has result set size limits, across all functionality. If you want unlimited results, you will need to pursue [authenticated access](http://wiki.dbpedia.org/OnlineAccess), or [get your own instance](http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtPayAsYouGoEBSBackedAMIDBpedia2015). `DESCRIBE` output may not exactly match the `SELECT` you started with, for various reasons; followup on this is probably best brought to [Virtuoso Users mailing list](https://lists.sourceforge.net/lists/listinfo/virtuoso-users/). – TallTed Feb 09 '16 at 15:42
  • 1
    Serialization format may be chosen through the menu on the SPARQL submission form (currently the most complete list), or with the `&format=` URL argument, or [the `define output:format` pragma](http://docs.openlinksw.com/virtuoso/rdfsparql.html#rdfcontrollingsparqloutputtypes) within a SPARQL query. – TallTed Feb 09 '16 at 16:11