4

How can I get a list of all organisations from DBpedia? By "organisation", I mean a entity of any type that is either a organisation or any subclass of organisation.

I found the question How to get all companies from DBPedia? but this doesn't work in the current DBpedia SPARQL web version and I wasn't able to adapt the query.

Josef
  • 1,467
  • 2
  • 24
  • 40
  • 1
    Would be could to see the query that you tried. And what "does not work" means - returned nothing, an incomplete result, the wrong result... – UninformedUser Apr 25 '16 at 14:54
  • 1
    In the query of your link, the prefix `dbpedia-owl` is now different `dbo` – UninformedUser Apr 25 '16 at 14:55
  • Please note that I have updated my answer: the filter was not in the right place. – Ivo Velitchkov Apr 25 '16 at 14:57
  • 2
    I've updated the answer to [How to get all companies from DBPedia?](https://stackoverflow.com/questions/20937556/how-to-get-all-companies-from-dbpedia) to include the appropriate prefix definition. – Joshua Taylor Apr 25 '16 at 15:57

2 Answers2

5

To simply get all resources that are an instance of dbo:Organization or its subclass:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT DISTINCT ?org { ?org a/rdfs:subClassOf* dbo:Organisation . }

However, as the question you linked shows, DBpedia has a cap on how many results are returned. So, as in the answer to said question, you can use a subquery with LIMIT and OFFSET to get all the results in chunks:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?org {
  SELECT DISTINCT ?org {
    ?org a/rdfs:subClassOf* dbo:Organisation .
 } ORDER BY ?org
}
LIMIT 10000 OFFSET 0

This would get you the first 10000 results. To get the next 10000, just add 10000 to the offset: LIMIT 10000 OFFSET 10000. Then, the next 10000 with OFFSET 20000, and so on.

evsheino
  • 2,147
  • 18
  • 20
  • when I read your answer, I first thought that indeed using `*` (zero or more) is more efficient, but only more efficient way. But in fact it gives more results than `a|a/rdfs:subClassOf+`. How would you explain that? – Ivo Velitchkov Apr 25 '16 at 14:50
  • I get the same amount of results with count disctinct (350113). – evsheino Apr 25 '16 at 15:31
  • Indeed, but using it in the query I suggested, it is 350113 with |+ and 386676 with*. Any idea why? – Ivo Velitchkov Apr 25 '16 at 16:00
  • I've no idea why they differ. I don't think they should, tho. – evsheino Apr 25 '16 at 16:46
  • `+` means instances of subclasses or descendants of `dbo:Organization`, while `*` also includes the instances that are only asserted to the `dbo:Organisation`. – UninformedUser Apr 25 '16 at 20:12
  • 2
    @AKSW Yes, but the property paths in question are `a|a/rdfs:subClassOf+` vs. `a/rdfs:subClassOf*`, which are equal if I'm not mistaken. – evsheino Apr 26 '16 at 04:59
3

You can get all organisations with a query like this, giving you English label and Wikipedia page for those resources that have it:

PREFIX  rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX    o: <http://dbpedia.org/ontology/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>

SELECT DISTINCT   ?orgURI ?orgName ?Wikipedia_page

WHERE {
           ?orgURI  a                  o:Organisation .

OPTIONAL { ?orgURI  rdfs:label         ?orgName . 
                    FILTER (lang(?orgName) = "en") }

OPTIONAL { ?orgURI  ^foaf:primaryTopic ?Wikipedia_page }

}

ORDER BY ?orgName

This will currently return 350033 results for those resources that are classified as http://dbpedia.org/ontology/Organisation.

To get also the members of subclasses of http://dbpedia.org/ontology/Organisation, you can change the first pattern by turning the property into a property path going though zero or more rdfs:subClassOf:

?orgURI  a/rdfs:subClassOf*  o:Organisation
TallTed
  • 9,069
  • 2
  • 22
  • 37
Ivo Velitchkov
  • 2,361
  • 11
  • 21