1

Q: I am trying to get list of all persons on wikipedia along with their age, birth_date, death_date(if present) and country.

I am using this dbpedia query which seems to return only 50,000 results which is definitely not true. A lot of entries are missing from here, for e.g. - Mick Jagger, etc.

SELECT ?person ?birthDate ?birthName ?occupation WHERE 
{
 ?person a <http://dbpedia.org/ontology/Person> .
 ?person dbpedia-owl:birthDate ?birthDate .
 ?person dbpedia-owl:birthName ?birthName .
 ?person dbpedia-owl:occupation ?occupation 
}

I also tried some of its variations -

  select ?Person 
  where {
  ?Person a dbpedia-owl:Person 
  }

Can someone provide me some direction on how to achieve the task ? I am first time using DBPedia so It may be the case that I am missing out something trivial.

I need as much data as I can get about persons on earth. (may be millions of person with their age, country and birth_date)and 50k is very less number and it is also missing out some names which are mandatory for me to get.

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
Udit Gupta
  • 3,162
  • 11
  • 43
  • 71
  • 1
    Have you read already the Fair Use Policy under section 1.1 (http://wiki.dbpedia.org/OnlineAccess)? Maybe those links are helping. I guess you need to split up your query. As an alternative: download the complete Wikipedia (via http://xowa.sourceforge.net/ for example) and extract the data from your local copy. – Joachim Rohde Oct 14 '14 at 20:04
  • 2
    I've remove the bit about "could someone please direct me towards a correct resource", because open-ended resource requests are off-topic for Stack Overflow. – Joshua Taylor Oct 14 '14 at 20:25

1 Answers1

2

It's relatively easy to get all the triples about persons:

select ?s ?p ?o { ?s a dbpedia-owl:Person ; ?p ?o }

Alternatively, you could get the results back as an RDF graph with a construct query:

construct where { ?s a dbpedia-owl:Person ; ?p ?o }

That said, you're going to hit some reasonable limits imposed by the public DBpedia endpoint. After all, your local library might make free photocopies of specific pages of books, but if you blindly ask for a photocopy of every autobiography in the building, they'd be right to refuse you on the grounds that that wouldn't be fair to other patrons. You'll need to download the data and query it locally if you want this sort of data.

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • Thanks .. Can you also update your answer with how to split the same query into multiple calls ? May be getting 50k per call is feasible. Creating a local repo may not be the feasible option – Udit Gupta Oct 14 '14 at 21:05
  • Ok .. got it `OFFSET` is the one to use. Thanks anyways :-) – Udit Gupta Oct 14 '14 at 21:23
  • 2
    If you use `offset`, then you also need to use `order by`; otherwise there's no predictable order into which you're offsetting. – Joshua Taylor Oct 15 '14 at 00:40
  • @UditGupta Stack Overflow is not just about finding the original asker an answer, but later users who find the question, too. If you found a solution to your problem, please post an answer and mark it as accepted. (It's quite alright to accept your own answer.) – Joshua Taylor Oct 15 '14 at 00:41
  • Oh ya you are right. I found the mistake. It is not allowing me to use OFFSET and ORDER By both. Limits has been imposed on the number of rows. Here is a related article but somehow this one also not working out for me. `https://github.com/mff-uk/DPUs/issues/78`. So, the problem is still unsolved :-( – Udit Gupta Oct 15 '14 at 01:05
  • 1
    @UditGupta What do you mean you can't use OFFSET and ORDER BY? You *need* ORDER BY when using OFFSET. ORDER BY does have to come *before* OFFSET, though. E.g., `select * { ... } ORDER BY ... OFFSET ...` is OK, but `select * { ... } OFFSET ... ORDER BY ...` is not. – Joshua Taylor Oct 15 '14 at 01:08
  • I mean there is nothing wrong with ORDER BY and OFFSET in general but DBPedia is not allowing me to query on more than 40K rows. That is what exactly mentioned in the github link I posted above in the previous comment and so it is not even allowing em to use ORDER BY. – Udit Gupta Oct 15 '14 at 02:15
  • 1
    @UditGupta See [my answer](http://stackoverflow.com/a/20939857/1281433) to [How to get all companies from DBPedia?](http://stackoverflow.com/q/20937556/1281433) for a workaround to that problem. The github issue that you linked to actually provides a solution, and the one in that answer is pretty much the same. – Joshua Taylor Oct 15 '14 at 02:18