0

I run from SPARQL Explorer at DBpedia. I wish to get each President only once, but as some of them have multiple entries for birthplace it gives multiple rows.

SELECT DISTINCT ?person ?birthPlace  ?presidentStart ?presidentEnd 
WHERE {
      ?person dct:subject dbc:Presidents_of_the_United_States.
      ?person dbo:birthPlace ?birthPlace .

       OPTIONAL { ?person dbp:presidentEnd   ?presidentEnd }  .
       OPTIONAL { ?person dbp:presidentStart ?presidentStart }  . 

FILTER ( regex(?birthPlace,   "_")  OR
         regex(?birthPlace, ";_")
       ) . 
} 
GROUP BY ?person 
ORDER BY  ?presidentStart ?person 
LIMIT 100

I would like to have only the STATE where they are born.

:Abraham_Lincoln [http] :Hodgenville,_Kentucky [http]   -   -
:Barack_Obama [http]    :Kapiolani_Medical_Center_for_Women_and_Children [http] -   -
:Bill_Clinton [http]    :Hope,_Arkansas [http]  -   -
:Dwight_D._Eisenhower [http]    :Denison,_Texas [http]  -   -
:George_W._Bush [http]  :New_Haven,_Connecticut [http]  -   -
:George_Washington [http]   :Westmoreland_County,_Virginia [http]   -   -
:George_Washington [http]   :British_America [http] -   -
:George_Washington [http]   :George_Washington_Birthplace_National_Monument [http]  -   -
:James_A._Garfield [http]   :Orange,_Ohio [http]    -   -
:James_A._Garfield [http]   :Moreland_Hills,_Ohio [http]    -   -
:Jimmy_Carter [http]    :Plains,_Georgia 
TallTed
  • 9,069
  • 2
  • 22
  • 37
Paul
  • 1
  • Possible duplicate of [Aggregating results from SPARQL query](https://stackoverflow.com/questions/18212697/aggregating-results-from-sparql-query) – Jeen Broekstra Dec 07 '17 at 21:21
  • Slight amendment, it's not a duplicate because you're looking to zoom in on a specific value, rather than aggregate. – Jeen Broekstra Dec 07 '17 at 21:24

2 Answers2

2

As SPARQL is a pattern matching language, the trick, when your query result is "too broad/general", is to create a more specific pattern. In this case, your intent is not just to get back all resources that are marked as dbo:birthPlace values, but only those resources that represent U.S. states.

So we need to figure out how U.S. states are distinguished from other locations in DBPedia.

Let's take Kentucky as an example. The resource representing Kentucky is http://dbpedia.org/resource/Kentucky . If we scroll down the page outlining the properties of that resource, we find multiple entries for the rdf:type relation, but the one that jumps out at me as most suitable is yago:WikicatStatesOfTheUnitedStates (http://dbpedia.org/class/yago/WikicatStatesOfTheUnitedStates).

If we modify your query to put that in as an extra restriction, and drop the weird regular expression, like so:

SELECT DISTINCT ?person ?birthPlace  ?presidentStart ?presidentEnd 
WHERE {
      ?person dct:subject dbc:Presidents_of_the_United_States.
      ?person dbo:birthPlace ?birthPlace .
      ?birthPlace a yago:WikicatStatesOfTheUnitedStates .

   OPTIONAL { ?person dbp:presidentEnd   ?presidentEnd }  .
   OPTIONAL { ?person dbp:presidentStart ?presidentStart }  .  
} 
GROUP BY ?person 
ORDER BY  ?presidentStart ?person 
LIMIT 100

You should get what you need.

Unfortunately, if you try, you find that you don't. This is because DBPedia data is messy. The above query only returns three results, and worse, one result is clearly incorrect:

person                 birthPlace   presidentStart  presidentEnd
dbr:Barack_Obama       dbr:Hawaii
dbr:George_Washington  dbr:Virginia
dbr:Theodore_Roosevelt dbr:New_York_City        

There's two things going on here: first of all, New York City is incorrectly classified as a state in DBPedia. Secondly, most presidents do not explicitly have their state marked as their birthplace, but only things like their home town.

Fortunately, we can amend slightly. DBPedia knows that HodgenVille, Kentucky, is located in Kentucky. How does it know? Well, have a look at the resource page for Hodgenville: http://dbpedia.org/resource/Hodgenville,_Kentucky . You'll see that it has a dbo:isPartOf relation with the resource representing the state of Kentucky.

So, we need to rephrase our query again: we want the state for each president where their birthplace is part of that state. In SPARQL:

SELECT DISTINCT ?person ?birthState  ?presidentStart ?presidentEnd 
WHERE {
      ?person dct:subject dbc:Presidents_of_the_United_States.
      ?person dbo:birthPlace ?birthPlace .
      ?birthPlace dbo:isPartOf ?birthState .
      ?birthState a yago:WikicatStatesOfTheUnitedStates .

   OPTIONAL { ?person dbp:presidentEnd   ?presidentEnd }  .
   OPTIONAL { ?person dbp:presidentStart ?presidentStart }  .  
} 
GROUP BY ?person 
ORDER BY  ?presidentStart ?person 
LIMIT 100

This should get you almost completely the result you need.

Update as you noted, Donald Trump is missing from the list. This looks to be because DBPedia is behind the times, and he's still classified as a "presidential candidate" rather than a president.

As for Grover Cleveland appearing four times, this is an interesting anomaly. Cleveland served two non-consecutive terms as president, from 1885 to 1889, and again from 1893 to 1897. So there's two start dates, and two end dates. Because in DBPeda it is not explicitly modeled which start date belongs to which end date, you simply get a result for each combination of start and end dates, four in total. There may be a way to query around this (one option would be to group start and end dates together using a group_concat aggregate), but it's such an edge case that it might be simpler to just handle it in post-processing.

Jeen Broekstra
  • 21,642
  • 4
  • 51
  • 73
  • Hi and thank for your answer, it did open my mind. As far as I know there should be 45 presidents and in your list "Grover Cleveland State_of_New_Jersey" appears four times. By the way Donald Trump does not appear too. – Paul Dec 09 '17 at 08:43
0

Focusing on

I would like to have only the STATE where they are born

rather than on

How to get rid of multiple rows with DBPEDIA SPARQL

this could be a solution:

SELECT DISTINCT ?person ?birthState  ?presidentStart ?presidentEnd 
WHERE {
      ?person dct:subject dbc:Presidents_of_the_United_States.


       OPTIONAL { ?person dbp:presidentEnd   ?presidentEnd }  .
       OPTIONAL { ?person dbp:presidentStart ?presidentStart }  .
       OPTIONAL {?person dbo:birthPlace/dbp:subdivisionType/dbp:territory ?birthState } .

FILTER ( regex(?birthState,   "_")  OR
         regex(?birthState, ";_")
       ) . 
} 
GROUP BY ?person 
ORDER BY  ?presidentStart ?person 
LIMIT 100
Ivo Velitchkov
  • 2,361
  • 11
  • 21
  • Hi and thank for your answer, it did open my mind. As far as I know there should be 45 presidents and in your list "Grover Cleveland State_of_New_Jersey" appears four times. By the way Donald Trump does not appear too. – Paul Dec 09 '17 at 08:42