I have set up a Virtuoso server for hosting Freebase data (version 07.20.3217, built Jan 5 2017; I really appreciate if you can have a try).
Let's consider this scenario: find the largest location (probably a county, denoted by ?var1
) in Wisconsin State (fb:m.0824r
), where ?var1
contains at least one location (denoted by ?var2
) with the type fb:place_with_neighborhoods
.
I wrote the SPARQL query as follows:
PREFIX fb: <http://rdf.freebase.com/ns/>
SELECT DISTINCT ?var1 ?var2 ?v2_name WHERE {
fb:m.0824r fb:location.location.contains ?var1 .
?var1 fb:location.location.contains ?var2 .
?var2 fb:type.object.type fb:location.place_with_neighborhoods .
?var1 fb:location.location.area ?area .
OPTIONAL { ?var2 fb:type.object.name ?v2_name } .
} ORDER BY DESC(?area)
LIMIT 1
Unfortunately, the Virtuoso engine fail to return the query result for more than one hour.
I tried some simpler queries, which could produce results in less than one second:
PREFIX fb: <http://rdf.freebase.com/ns/>
SELECT DISTINCT ?var1 ?var2 ?v2_name WHERE {
fb:m.0824r fb:location.location.contains ?var1 .
?var1 fb:location.location.contains ?var2 .
?var2 fb:type.object.type fb:location.place_with_neighborhoods .
OPTIONAL { ?var2 fb:type.object.name ?v2_name } .
}
# Remove the area-related information with ?var1
# Returns ONLY ONE result in 0.05s.
and,
PREFIX fb: <http://rdf.freebase.com/ns/>
SELECT DISTINCT ?var1 ?var2 ?v2_name ?area WHERE {
fb:m.0824r fb:location.location.contains ?var1 .
?var1 fb:location.location.contains ?var2 .
?var1 fb:location.location.area ?area .
OPTIONAL { ?var2 fb:type.object.name ?v2_name } .
}
# Remove the type limitation of ?var2
# Returns ~7000 results in ~1s.
Given the results of those simpler queries, I'm really confused which step brought the performance issue. Is there anybody who can give me some advice? Thank you so much!