3

I am using live Dbpedia (http://dbpedia-live.openlinksw.com/sparql/) to get basic details of notable people. My query is:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbpedia-owl: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?x0 ?name ?dob WHERE {
  ?x0 rdf:type foaf:Person.
  ?x0 rdfs:label ?name.
  ?x0 dbpedia-owl:birthDate ?dob.
  FILTER REGEX(?name,"^[A-Z]","i").
} LIMIT 200

This works and I use LIMIT 200 to limit the output to a small number of people. My problem is the 200 people are random, and I want some way of measuring 'notability' such that I return 200 notable people, rather than 200 random people. There are over 500,000 people in Dbpedia.

My question is, how can I measure 'notability' and limit the query to return notable people only? I realize there is no 'notability' property and it is very subjective. I am happy to use any indirect or approximate measure such as number of links or number of references. But I don't know how to do this.

Edit : As a result of the helpful comments I improved the query to include page ranks:

PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo:<http://dbpedia.org/ontology/>
PREFIX vrank:<http://purl.org/voc/vrank#>
SELECT DISTINCT ?s ?name2 ?dob ?v
FROM <http://dbpedia.org> 
FROM <http://people.aifb.kit.edu/ath/#DBpedia_PageRank> 
WHERE {
  ?s rdf:type foaf:Person.
  ?s rdfs:label ?name.
  ?s dbo:birthDate ?dob.
  ?s vrank:hasRank/vrank:rankValue ?v.
  FILTER REGEX(?name,"^[A-Z].*").
  BIND (str(?name) AS ?name2)
} ORDER BY DESC(?v) LIMIT 100

The problem now is there are lots of duplicates, even though I am using DISTINCT.

Ubercoder
  • 711
  • 8
  • 24
  • By "Notability" you mean how much the person is know by other people? Otherwise, you can simply count the number of relations that have the person as Subject/Object. It gives you an idea of the use of the node in the graph. – Gilles-Antoine Nys May 08 '18 at 14:46
  • Otherwise again, `dbp:votesmart` can give you an idea but not for all people. – Gilles-Antoine Nys May 08 '18 at 14:48
  • As you said it is very subjective and varies according to the interpretation. IMHO, easiest thing to do would be to check the "dct:subject" of the resources and select a category which you might cluster people according to some notability. If you check dbr:Pablo_Picasso then you will see dbc:Modern_painters category, and then you can check this category for other people. However, I don't know if this might be useful for you – Erwarth May 08 '18 at 14:57
  • 3
    http://people.aifb.kit.edu/ath/ (scroll down). Not sure this works on DBpedia **Live** although. – Stanislav Kralin May 08 '18 at 15:00
  • I will try the approach of using the number of relations as an approximate measure for "notability". I will look in my SPARQL book this evening to figure how to do it (I know for SQL but obviously that's of no help here!) – Ubercoder May 08 '18 at 16:25
  • It's unfortunate that regular DBpedia (not DBpedia Live) is not an option... Counting the number of incoming or outcoming statements for all `foaf:Person`s is time-consuming. BTW: https://stackoverflow.com/a/46797845/7879193 – Stanislav Kralin May 08 '18 at 17:43
  • I switched to DBpedia live because the non-live version had duplicate dates of birth. Anyway thanks for your link, I had a look and someone has a source of data for the page ranks which is good news. I edited my original question to show the new query and the results look good (the notable people are at the top). My problem now is it returns lots of duplicates! – Ubercoder May 10 '18 at 10:26

0 Answers0