Is it possible to compute the distance between two statements, in SPARQL or Jena? For example, is it possible to compute the distance between:
immanuel_kant dbprop:birthPlace Germany
John_Lock dbprop:birthPlace England
Is it possible to compute the distance between two statements, in SPARQL or Jena? For example, is it possible to compute the distance between:
immanuel_kant dbprop:birthPlace Germany
John_Lock dbprop:birthPlace England
It's hard to tell exactly what you're trying to compute (because we haven't been told), but it sounds like you'll be able to do this in SPARQL. The following query first computes a similarity metric for pairs of philosophers and binds it to ?initialSimilarity
. It's a just the ratio of the length of their names. It's not a particularly good similarity measure, but you said that you've already got some of these defined (the .60 that mentioned in the comments). Then the query retrieves the birthplaces of the two philosophers. If they're the same, then .05 is added to the similarity metric, or if they're different, .05 is subtracted, and this value is bound to ?finalSimilarity
. (Note that individuals may have multiple values for the birthPlace property, so you'll see the same pair of philosophers appear n×m times, where n is the number of birthplaces one has, and m the number that the other has. You could group by pairs here and then take the average of the final similarities, or you could do something to resolve the multiple statements, e.g., sample a representative birthplace for each one.)
select ?name1 ?name2 ?bp1 ?bp2 ?initialSimilarity ?finalSimilarity where {
dbpedia-owl:Philosopher ^a ?phil1, ?phil2 .
?phil1 rdfs:label ?name1 .
?phil2 rdfs:label ?name2 .
filter( langMatches(lang(?name1),"en") && langMatches(lang(?name2),"en"))
bind ( strlen(?name1) as ?len1 )
bind ( strlen(?name2) as ?len2 )
bind ( if(?len1 < ?len2, ?len1, ?len2) as ?minLen )
bind ( if(?len1 < ?len2, ?len2, ?len1) as ?maxLen )
bind ( ?minLen/xsd:double(?maxLen) as ?initialSimilarity )
?phil1 dbpedia-owl:birthPlace ?bp1 .
?phil2 dbpedia-owl:birthPlace ?bp2 .
bind ( if( ?bp1 = ?bp2, ?initialSimilarity + .05, ?initialSimilarity - .05) as ?finalSimilarity )
}
limit 10
Based on the clarfications in the comments, it's not too hard to compute your initial similarity metric, which you've defined as the number of classes in common over the number of classes that the individuals have in total. This can be done with a query like this:
select ?philosopher1
?philosopher2
(count(distinct ?commonType) as ?intersection)
(count(distinct ?eitherType) as ?union)
(count(distinct ?commonType)/xsd:double(count(distinct ?eitherType)) as ?similarity)
where {
dbpedia-owl:Philosopher ^a ?philosopher1, ?philosopher2 .
filter( ?philosopher1 != ?philosopher2 )
?commonType ^a ?philosopher1, ?philosopher2 .
{ ?eitherType ^a ?philosopher1 } UNION
{ ?eitherType ^a ?philosopher2 }
}
group by ?philosopher1 ?philosopher2
limit 3
which produces results like this:
philosopher1 philosopher2 intersection union similarity
http://dbpedia.org/resource/Bawa_Muhaiyaddeen http://dbpedia.org/resource/Abdolkarim_Soroush 6 34 0.176471
http://dbpedia.org/resource/Eric_Voegelin http://dbpedia.org/resource/Abdolkarim_Soroush 6 30 0.2
http://dbpedia.org/resource/Eric_Ormsby http://dbpedia.org/resource/%C3%89mile_Meyerson 18 24 0.75
All you need to do is use a query like the first one to additionally select the birthplaces of the philosophers, and then execute whatever formula you're using to compute similarity to get the similarity modifier, and then you can modify the similarity value.