2

I am trying to compile (an excerpt of) a list of all properties in a triple store via a SPARQL endpoint. Each of the following two queries yields promising results:

A:

SELECT DISTINCT ?prop
WHERE {
  [] ?prop [].
}
LIMIT 25

B:

SELECT DISTINCT ?prop
WHERE {
  ?prop a rdf:Property.
}
LIMIT 25

As expected/hoped for, there are some items that appear in the result sets of both queries. Hence, combining the restrictions should, based on my current understanding of SPARQL, yield those items again:

C:

SELECT DISTINCT ?prop
WHERE {
  [] ?prop [].
  ?prop a rdf:Property.
}
LIMIT 25

But actually, this query hardly yields any results. Why is that?

I do not recognize what I am doing wrong, and answers to this question as well as to that question seem to suggest an analogous technique of combining those two (theoretically redundant, in very tidy ontologies) restrictions as the way to go.


Test cases:

Interestingly, the SPARQL endpoints by JES & Co. and by the Alpine Ski Racers of Austria behave like I would expect and the result set of query C is non-empty and filled (to the LIMIT I imposed) with properties that are returned by queries A and B.

So: Why isn't what should be the intersection of two result sets actually the intersection? Are the described endpoints that buggy (unlikely ...), or is my understanding of SPARQL flawed there (likely)?

Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58
O. R. Mapper
  • 20,083
  • 9
  • 69
  • 114
  • I'm not sure whether it explains these results, but there are reasons that the intersection of `[] ?prop []` and `?prop a rdf:Property` isn't the same as their union: being declared a property (with `a rdf:Property`) doesn't imply that the property is used as a property in the dataset, and being used as a property in the dataset doesn't imply that the property is declared as a property. In general, their intersection can be smaller than their union. This _does_ come into play on DBpedia, where there are properties that aren't declared as `rdf:Property`s. – Joshua Taylor Oct 22 '13 at 12:45
  • @JoshuaTaylor: Thanks for the explanations. However, I am aware that the unions of **A** and **B** wouldn't be the same as either **A** or **B**; I am wondering why the *intersection* of **A** and **B** does not comprise of all elements that are in both **A** *and* **B**. – O. R. Mapper Oct 22 '13 at 12:50
  • Yes, I did understand that (and saw the commentary on the earlier, now deleted, answer), but was focusing on the beginning of the question "I am trying to compile (an excerpt of) a list of all properties in a triple store via a SPARQL endpoint [with two promising queries]", since I expect that some people finding the question will be looking for the union of the results. Why the intersection isn't getting returned _is_ even more puzzling. – Joshua Taylor Oct 22 '13 at 12:53
  • @JoshuaTaylor: Ah, right. I'm not convinced my question was well-worded in that respect. Good idea to clarify that then. (My actual use-case is somewhat different; I started out trying to get all predicates along with certain labels, and when that yielded empty result sets even though I could see entities that would fulfil both requirements, I removed some clutter by presenting a less complex query here in the question that would reproduce the same issue.) – O. R. Mapper Oct 22 '13 at 12:57

1 Answers1

2

DBpedia: Queries A and B both return 24 properties such as !bgcolor, !logo or #FuelElements. Yet, the result set of query C is empty.

It could be due to DBpedia having timeouts.

WHERE {
   [] ?prop [].
   ?prop a rdf:Property .
}

is potentially a quite expensive query, depending on execution stragtegy. The first part says "get all triples". When just

SELECT DISTINCT ?prop
WHERE {
  [] ?prop [].
}
LIMIT 25

the results are a restricted stream. When you add the ?prop a rdf:Property there is a DB join needed to find the ?prop in common. As it's a rather unusually pattern, there may be less support from the optimizer.

At the moment, I get timeouts on

SELECT  ?prop
WHERE {
   ?prop a rdf:Property .
   [] ?prop [].
}
LIMIT 1
AndyS
  • 16,345
  • 17
  • 21
  • 1
    This makes sense for DBpedia, but doesn't explain the results on the literature endpoint, where the results seem to come back _very_ quickly. – Joshua Taylor Oct 22 '13 at 12:36