1

I'm wondering if we can know whether two resources have the same category or some subcategory (i.e., belong to categories of some common supercategory) in DBpedia? I tried this query in the DBpedia endpoint but it's wrong:

select distinct ?s ?s2 where {
?s skos:subject <http :// dbpedia.org/resource/ Category ?c.
?s2 skos:subject <http :// dbpedia.org/resource/ Category ?c2.
?c=?c2.
}
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
BigNoob
  • 17
  • 3
  • Is that the actual query that you tried? It's not syntactically well formed, so it wouldn't even be accepted by the endpoint. It's much better to start with queries that are legal syntax, and to then try to modify them into returning what you want. – Joshua Taylor Dec 24 '13 at 15:32
  • Did you make any progress with this? – Joshua Taylor Dec 29 '13 at 23:22

1 Answers1

2

DBpedia doesn't use skos:subject for resources, but rather relates resources to their Wikipedia categories using dcterms:subject. You can find out what data is available by browsing the resource pages. E.g., you might have a look at http://dbpedia.org/resource/Mount_Monadnock. If you want to find categories that two resources have in common, just use the same variable. E.g.,

?subject1 dcterms:subject ?category .
?subject2 dcterms:subject ?category .

You can write that more concisely with the ^property notation and object lists. Writing o ^p s is the same as writing s p o. Object lists let you write s p o1, o2 instead of s p o1. s p o2.. Putting these together, we can write:

?category ^dcterms:subject ?subject1, ?subject2 .

E.g., here's a query that finds common categories of Mount Monadnock and Spofford Lake. There's just one result, Landforms of Cheshire County, New Hampshire, since they only have one category in common.

select * where {
  ?category ^dcterms:subject dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}

SPARQL results

Now, categories are related to their supercategories in DBpedia by skos:broader, as you can see in http://dbpedia.org/page/Category:Landforms_of_Cheshire_County,_New_Hampshire, where there are links to

Now, this means that if two things have have some common category (or supercategory), each will be related to that category by a path starting with a dcterms:subject link and followed by zero or more skos:broader links. Thus, you could use a query like

select * where {
  ?category ^(dcterms:subject/skos:broader*) dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}

You'll find, unfortunately, that the DBpedia endpoint runs into memory usage problems with that query, so you can't run it exactly like that. However, the DBpedia SPARQL endpoint supports a property path feature that actually didn't make it into the standard; you can write p{n,m} to denote a chain of length at least n and at most m. This means you can put some ranges on that will get you most of the same results as *:

select distinct ?category where {
  ?category ^(dcterms:subject/(skos:broader{0,3})) dbpedia:Mount_Monadnock, dbpedia:Spofford_Lake .
}

SPARQL results

This works with Tom Cruise and Madonna as well, though you'll need to scale back the path length a bit because of the memory issues. For instance, the following query returns seventy-four results.

select distinct ?category where {
  ?category
      ^(dcterms:subject/(skos:broader{0,2}))
          <http://dbpedia.org/resource/Tom_Cruise>,
          <http://dbpedia.org/resource/Madonna_(entertainer)> .
}

SPARQL results

It's worth noting, though, that Wikipedia categories aren't types. So while both of those resources are rightly considered to be landforms, neither is a geography or, as you'll see in the later query, New Hampshire. Wikipedia categories are much more about topic than a type hierarchy.

Related reading

There's a related (but not quite duplicate question) that you might find helpful as well: Using SPARQL to locate a subject with multiple occurrences of same property.

Community
  • 1
  • 1
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • @user3132739 OK, updated to show how you can find common category or supercategory. – Joshua Taylor Dec 24 '13 at 15:29
  • Hi, when I try to run the queries you have mentioned in the above answer I get an error saying "Virtuoso 37000 Error SP030: SPARQL compiler, line 5: Undefined namespace prefix at 'dcterms' before '/'". Please let me know how to resolve this issue :) – EmJ Jun 23 '19 at 03:53
  • 1
    @emi yes, dbpedia changed some of the predefined namespaces on their end point. You'll need to either update the query by adding a DC terms namespace prefix, or updating the body of the query to use whatever prefix they Define for DC terms. – Joshua Taylor Jun 24 '19 at 11:25