0

I wanted to query the movies that have the highest number of shared type with Matrix movie.

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT ?movie_name (count(distinct ?atype) as ?numatype)
FROM <http://dbpedia.org/>
WHERE {

?movie rdf:type dbo:Film;
       rdf:type ?ftype.

dbr:The_Matrix rdf:type ?ttype.

?atype a owl:class;
       owl:intersectionOf [?ftype ?ttype].


?movie rdfs:label ?movie_name.
FILTER (LANG(?movie_name)="en").
}
GROUP BY ?movie_name 
ORDER BY DESC(?numatype)
LIMIT 100

I defined ?ttype as the type for The matrix movie and ?ftype as the type of ?movie.

when I query this in http://dbpedia.org/sparq there are no results.

DSaad
  • 181
  • 3
  • 12
  • That's the wrong direction. Subqueries are executed first, then the outer query by using the variables that you SELECT in the subquery. Not vice versa. – UninformedUser Jul 16 '17 at 10:38
  • And what do you expect to achieve with the `owl:intersectionOf` triple pattern? SPARQL is about pattern matching, I'm not aware of such data in DBpedia. – UninformedUser Jul 16 '17 at 10:39
  • @AKSW : In dbpedia.org under rdf:type there is a list of classes for each movie. I want to find the movies that have the highest number of shared type with Matrix movie. Then order these movies based on the number of shared type. For example, matrix movie has 44 classes and the matrix reloaded has 46. But the number of shared classes is 36. – DSaad Jul 16 '17 at 10:48
  • @AKSW I edited my code. – DSaad Jul 16 '17 at 11:24
  • That doesn't matter. The RDF data is nevertheless modeled differently, by having a bunch of `rdf:type` triples for each type that you see: `movie1 rdf:type type1. movie1 rdf:type type2 . ` etc – UninformedUser Jul 16 '17 at 13:03
  • Is this part of an assignment or some group exercise. This seems very similar to another question [How to find similar content using SPARQL](https://stackoverflow.com/questions/21290186/how-to-find-similar-content-using-sparql), which also talked about finding properties common to the Matrix and other films. – Joshua Taylor Jul 18 '17 at 12:47

1 Answers1

3

The idea is to use a simple join on the types:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT (SAMPLE(?l) as ?movie_name) 
       (count(distinct ?ttype) as ?numSharedTypes) 
WHERE {
  VALUES ?s {dbr:The_Matrix}
  ?s a ?ttype .
  ?movie a dbo:Film ;
         a ?ttype .
  FILTER(?movie != ?s)
  ?movie rdfs:label ?l .
  FILTER (LANGMATCHES(LANG(?l), 'en'))
}
GROUP BY ?movie
ORDER BY desc(?numSharedTypes)
LIMIT 100

The JOIN itself might be expensive, thus, you could get a timeout resp. due to the anytime feature of Virtuoso get an incomplete result back.

It looks like the query optimizer isn't that smart enough, especially the labels make the performance worse. A bunch of sub-SELECTs make it much faster, although more complex in reading the query:

PREFIX  dbo:  <http://dbpedia.org/ontology/>
PREFIX  rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX  rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX  dbr:  <http://dbpedia.org/resource/>

SELECT  ?movie_name ?numSharedTypes
WHERE
  { ?movie  rdfs:label  ?l
    FILTER langMatches(lang(?l), "en")
    BIND(replace(replace(str(?l), "\\(film\\)$", ""), "[^0-9]*\\sfilm\\)$", ")") AS ?movie_name)
    { SELECT  ?movie (COUNT(?type) AS ?numSharedTypes)
      WHERE
        { ?movie  rdf:type  dbo:Film ;
                  rdf:type  ?type
          { SELECT  ?type
            WHERE
              { dbr:The_Matrix rdf:type  ?type
              }
          }
          FILTER ( ?movie != dbr:The_Matrix )
        }
      GROUP BY ?movie
      ORDER BY DESC(?numSharedTypes) ASC(?movie)
      LIMIT   100
    }
  }
ORDER BY DESC(?numSharedTypes) ASC(?movie_name)

Result (chunk):

+------------------------+----------------+
|       movie_name       | numSharedTypes |
+------------------------+----------------+
| The Matrix Reloaded    |             36 |
| The Matrix Revolutions |             33 |
| The Matrix (franchise) |             30 |
| Demolition Man         |             28 |
| Freejack               |             28 |
| Conspiracy Theory      |             27 |
| Deep Blue Sea (1999)   |             27 |
| Fair Game (1995)       |             27 |
| Judge Dredd            |             27 |
| Revenge Quest          |             27 |
| Screamers (1995)       |             27 |
| Soldier (1998)         |             27 |
| The Invasion           |             27 |
| Timecop                |             27 |
| Total Recall (1990)    |             27 |
| V for Vendetta         |             27 |
| Assassins              |             26 |
| ...                    |            ... |
+------------------------+----------------+
UninformedUser
  • 8,397
  • 1
  • 14
  • 23
  • Thank you very much. I designed an answer with your help but I increased the Execution timeout to get the answer in http://dbpedia.org/sparql . – DSaad Jul 16 '17 at 18:13
  • @AKSW, the list of films you end up with is very similar to the ones that came up in [this answer](https://stackoverflow.com/a/21290432/1281433) (for 2014, even!) – Joshua Taylor Jul 18 '17 at 12:51