I am using live Dbpedia (http://dbpedia-live.openlinksw.com/sparql/) to get basic details of notable people. My query is:
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?x0 ?name ?dob WHERE {
?x0 rdf:type foaf:Person.
?x0 rdfs:label ?name.
?x0 dbpedia-owl:birthDate ?dob.
FILTER REGEX(?name,"^[A-Z]","i").
} LIMIT 200
This works and I use LIMIT 200 to limit the output to a small number of people. My problem is the 200 people are random, and I want some way of measuring 'notability' such that I return 200 notable people, rather than 200 random people. There are over 500,000 people in Dbpedia.
My question is, how can I measure 'notability' and limit the query to return notable people only? I realize there is no 'notability' property and it is very subjective. I am happy to use any indirect or approximate measure such as number of links or number of references. But I don't know how to do this.
Edit : As a result of the helpful comments I improved the query to include page ranks:
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX vrank:<http://purl.org/voc/vrank#>
SELECT DISTINCT ?s ?name2 ?dob ?v
FROM <http://dbpedia.org>
FROM <http://people.aifb.kit.edu/ath/#DBpedia_PageRank>
WHERE {
?s rdf:type foaf:Person.
?s rdfs:label ?name.
?s dbo:birthDate ?dob.
?s vrank:hasRank/vrank:rankValue ?v.
FILTER REGEX(?name,"^[A-Z].*").
BIND (str(?name) AS ?name2)
} ORDER BY DESC(?v) LIMIT 100
The problem now is there are lots of duplicates, even though I am using DISTINCT.