2

Let's say you have a document that mentions "Turkey" and "Istanbul" and you want to extract those keywords and match it to a Wikipedia article. But for "turkey" it could mean for instance either Turkey the country or turkey the bird. Is it then possible to use the second keyword, Istanbul, to measure the "distance" between that and the right "Turkey". So:

Istanbul -> Turkey the country -> close.

Istanbul -> turkey the bird -> distant.

To explain what I mean with distance further: as I understand SPARQL can traverse graphs and DBPedia is a type of (knowledge) graph so the distance I am looking for could probably be in the graph.

Marius Lian
  • 523
  • 7
  • 15

1 Answers1

2

You can find the length of a path between two resources in SPARQL if there's a unique path between the resources. (This has been described in a number of places now; e.g., this answer to Calculate length of path between nodes?.) However, you cannot use that technique if there are multiple paths joining the endpoints, because it works by counting nodes on the path(s) between the resources, so if there are multiple paths, it won't be very useful.

In DBpedia, there could be lots of paths between any pair of resources, so it's rather hard to use that sort of metric. An alternative that you could use, though, is to find the closest common superclass, and use a metric based on that. That approach has been discussed in this answer to finding common superclass and length of path in class hierarchies.

Community
  • 1
  • 1
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • Ok, this is probably something I can use. I experience a strange problem using SPARQLWrapper. Might be off topic but do you have an idea why the exact same query displays like this in Virtuoso: [link](http://screencast.com/t/aCSkqkdQZ) Then in SPARQLWrapper: [link](http://screencast.com/t/Rs2dy1uGtT) – Marius Lian Feb 11 '14 at 07:36
  • @MariusLian Hm, no, I don't. Sometimes the Virtuoso endpoint imposes timeouts and memory limits, so that might be a reason for getting different results at different times. – Joshua Taylor Feb 11 '14 at 12:40