3

I'm trying to get the most famous movies in the world from Wikidata with SPARQL.

I have the following query:

SELECT ?item WHERE {
  ?item wdt:P31 wd:Q11424.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

which returns ALL movies (about 214143).

I basically only need movies that have, let's say, more than 10 language entries on wikipedia, as I'm guessing these will be the most famous ones.

Is there a way to do this inside the query itself, without checking all entries ?

Dylan
  • 9,129
  • 20
  • 96
  • 153
  • 1
    Interesting definition of "famous"… Note that Wikidata content is not the same as Wikipedia content, so the number of languages used about something on Wikidata may be more or less than the number of languages used about it on Wikipedia. – TallTed Jan 27 '18 at 17:12
  • I realize that, but I think the number of languages (labels) still is quite useful to determine its popularity. – Dylan Jan 27 '18 at 18:13
  • 2
    It is possible to retrieve the so-called sitelinks and then count their number for every movie. Fortunately, the number of sitelinks is a precalculated value: https://stackoverflow.com/a/46797845/7879193. – Stanislav Kralin Jan 27 '18 at 18:41
  • 1
    I'd also use some other measure for movies, but if you're happy with it: `SELECT ?item (count(distinct ?lang) as ?langCnt) WHERE { ?item wdt:P31 wd:Q11424. ?item rdfs:label ?label . bind(lang(?label) as ?lang) } group by ?item having (count(distinct ?lang) > 10)` – UninformedUser Jan 27 '18 at 19:24

1 Answers1

5

A naive answer to your question is:

SELECT ?movie (count(?wikipage) AS ?count) WHERE {
   hint:Query hint:optimizer "None" .
   ?movie wdt:P31 wd:Q11424 .
   ?wikipage schema:about ?movie .
   ?wikipage schema:isPartOf/wikibase:wikiGroup "wikipedia" 
} GROUP BY ?movie HAVING (?count > 10) ORDER BY DESC(?count)

Try it!

Alternatively, you could consider total number of sitelinks. Sitelinks include links to Wikipedia and also links to Wikiquote, Wikivoyage etc. The advantage is that total number of sitelinks is precomputed.

SELECT ?movie ?sitelinks WHERE {
   ?movie wdt:P31 wd:Q11424 .
   ?movie wikibase:sitelinks ?sitelinks .
   FILTER (?sitelinks > 10) 
} ORDER BY DESC(?sitelinks)

Try it!

See also these questions:


As @TallTed and @AKSW have pointed out, the number of labels in different languages may be differ from the number of Wikipedia articles in different languages. Here below a comparison.

Top 5 movies by Wikipedia articles

|        title        | articles | sitelinks | labels |
|---------------------|----------|-----------|--------|
| Avatar              |       92 |       103 |     99 |
| Titanic             |       86 |       100 |    101 |
| The Godfather       |       79 |       103 |     82 |
| Slumdog Millionaire |       72 |        75 |     80 |
| Forrest Gump        |       71 |       101 |     84 |

Top 5 movies by sitelinks

|     title     | articles | sitelinks | labels |
|---------------|----------|-----------|--------|
| Avatar        |       92 |       103 |     99 |
| The Godfather |       79 |       103 |     82 |
| Forrest Gump  |       71 |       101 |     84 |
| Titanic       |       86 |       100 |    101 |
| The Matrix    |       67 |        94 |     77 |

Top 5 movies by labels

|            title             | articles | sitelinks | labels |
|------------------------------|----------|-----------|--------|
| The 25th Reich               |        2 |         2 |    227 |
| Time Is But Brief            |        0 |         0 |    224 |
| Michael Moore in TrumpLand   |        6 |         6 |    222 |
| Magnus - The Mozart of Chess |        1 |         1 |    221 |
| Lee Chong Wei                |        1 |         1 |    196 |
Stanislav Kralin
  • 11,070
  • 4
  • 35
  • 58