1

I have a specific instance and a SPARQL query that retrieves other instances similar to that one. By similar, I mean that the other instances have at least one common property and value in common with the specific instance, or have a class in common with the specific instance.

Now, I'd like to extend the query such that if the specific instance has a value for a "critical" property, then the only instances that are considered similar are those that also have that critical property (as opposed to just having at least one property and value in common).

For instance, here is some sample data in which instance1 has a value for predicate2 which is a subproperty of isCriticalPredicate.

@prefix : <http://example.org/rs#>
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

:instance1  a   :class1.
:instance1  :predicate1 :instance3.
:instance1  :predicate2 :instance3.  # (@) critical property

:instance2  a   :class1.
:instance2  :predicate1 :instance3.

:instance4  :predicate2 :instance3.

:predicate1 :hasSimilarityValue 0.6.

:predicate2 rdfs:subPropertyOf   :isCriticalPredicate.
:predicate2 :hasSimilarityValue 0.6.

:class1 :hasSimilarityValue 0.4.

Here is a query in which ?x is the specific instance, instance1. The query retrieves just instance4, which is correct. However, if I remove the critical property from instance1 (line labeled with @), I get no results, but should get instance2, since it has a property in common with instance1. How can I fix this?

PREFIX : <http://example.org/rs#>

select ?item (SUM(?similarity * ?importance * ?levelImportance) as ?summedSimilarity) 
(group_concat(distinct ?becauseOf ; separator = " , ") as ?reason) where
{
  values ?x {:instance1}
  bind (4/7 as ?levelImportance)
  {
    values ?instanceImportance {1}
    ?x  ?p  ?instance.
    ?item   ?p  ?instance.
    ?p  :hasSimilarityValue ?similarity
      bind (?p as ?becauseOf)
    bind (?instanceImportance as ?importance)
  }
  union
  {
      values ?classImportance {1}
    ?x  a   ?class.
    ?item   a   ?class.
    ?class  :hasSimilarityValue ?similarity
      bind (?class as ?becauseOf)
        bind (?classImportance as ?importance)
  }
  filter (?x != ?item)

    ?x :isCriticalPredicate ?y.
    ?item   :isCriticalPredicate ?y.

}
group by ?item
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
Ania David
  • 1,168
  • 1
  • 15
  • 36

1 Answers1

4

I've said it before and I'll say it again: minimal data is very helpful. From your past questions, I know that your working project has similarity values on properties and the like, but none of that really matters for the problem at hand. Here's some data that just has a few instances, property values, and one property designated as critical:

@prefix : <urn:ex:>

:p a :criticalProperty .

:a :p :u ;   #-- :a has a critical property, so 
   :q :v .   #-- only :d can be similar to it.

:c :q :v ;   #-- :c has no critical properties, so
   :r :w .   #-- both :a and :d can be similar to it.

:d :p :u ;
   :q :v .

The trick in a query like this is to filter out the results that have the problem, not to try to select the ones that don't. Logically, those mean the same thing, but in writing the query, it's easier to think about constraint violation, and to try to filter out the results that violate the constraint. In this case, you want to filter out any results where the instance has a critical property and value but the similar instance doesn't.

prefix : <urn:ex:>

select ?i (group_concat(distinct ?j) as ?js) where {

  #-- :a has a critical property, but
  #-- :c does not, so these are useful
  #-- starting points
  values ?i { :a :c }

  #-- get ?j instances that have a value
  #-- in common with ?i.
  ?i ?property ?value .
  ?j ?property ?value .

  #-- make sure that ?i and ?j aren't
  #-- the same instance
  filter (?i != ?j)

  #-- make sure that there is no critical
  #-- property value that ?i has that
  #-- ?j does not also have
  filter not exists {
    ?i ?criticalProperty ?criticalValue .
    ?criticalProperty a :criticalProperty .
    filter not exists {
      ?j ?criticalProperty ?criticalValue .
    }
  }
}
group by ?i
----------------------------
| i  | js                  |
============================
| :a | "urn:ex:d"          |
| :c | "urn:ex:d urn:ex:a" |
----------------------------

Related

There are some other questions that also touch on constaint satisfaction/violation that might be useful reading. While not all of these use nested filter not exists, most of them do have a pattern of filter not exists { … filter <negative condition> }.

Community
  • 1
  • 1
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
  • I'm trying to understand your answer, but first of all, the data that I gave you, is the data that I'm working on, I sware, I don't normally try on real data because not all the data that I have now are capable of testing the features that I want. I am trying to undestand the answer, thank you very much, and I will come back to you – Ania David Mar 03 '16 at 13:48
  • 1
    @AniaDavid I know the data that you showed is the data that you're working with. My point is that the data that you're working is **more complex** than the data that you need to illustrate the problem. The fact that you're posting your more complex data suggests that you haven't tried to isolate the problem and distill it down to the bare minimum. I started from scratch and wrote up some test data that's enough to illustrate the problem. That's **good debugging practice**, and it's usually expected that in *your debugging before asking the question*, you'll have done the same. – Joshua Taylor Mar 03 '16 at 13:52
  • 1
    @AniaDavid It might be helpful to have a look at [How to create a Minimal, Complete, and Verifiable example](http://stackoverflow.com/help/mcve). E.g., the suggestion to "**Restart from scratch.** Create a new program, adding in only what is needed to see the problem. This can be faster for vast systems where you think you already know the source of the problem." and "Eliminate any issues that aren't relevant to the problem." – Joshua Taylor Mar 03 '16 at 13:54
  • 1
    So, to recap; the issue you were having really only affects the first part of your query (where you're finding similar values based on the existence of a critical property), not the class-based part, so I stripped that out. The retrieval of similar instances doesn't require the similarity annotations, so I stripped those out. All that's left is "instances have some properties, and some properties are critical; how do find the right corresponding instances?" You'd take this answer, see the technique it uses, and re-incorporate it into your actual code. – Joshua Taylor Mar 03 '16 at 13:57
  • I understood your point, you are right. I should try on real data as well. (to be honest I don't have real data yet, I just created music ontologies because I like that domain, and it is the domain for the real project but I don't have the real ontolgoy/data yet. for the minimal thing, you are right, and honstely i do the same, i always try to simply the problem, but as you said i should have worked better in the question to show you. – Ania David Mar 03 '16 at 14:14
  • I understood the solution now, it is really clever, thanks. However, I modify your `?criticalProperty a :criticalProperty ` to `?criticalProperty rdfs:subPropertyOf :isCriticalPredicate` because I think that the semantic `a` (rdf:type) means that `criticalProperty` is class (which is a class), but maybe more precicely to say subPropertyOf. – Ania David Mar 03 '16 at 14:15
  • @AniaDavid Of course, if you're going to do that, then you probably shouldn't call the superproperty "isCriticalPredicate". Remember that "p is a subproperty of q" means if "A p B" then "A q B". Suppose you say that "hasName is a subproperty of isCriticalPredicate". Then from "A hasName 'John'" you can infer that "A isCriticalPredicate 'John'" which doesn't make any sense. Maybe it would make more sense to declare an boolean valued annotation property (like you're already doing for similarity values) so that you can say "p isCriticalProperty true". – Joshua Taylor Mar 03 '16 at 14:18
  • I see your point. Instead of isCriticalProporty with boolean range, I could just change its name to CriticalProperty, and then I can annotate any other properties by making them sub properties of CriticalProperties. I came up with this idea for similarities, for instance if two books talk about Scotland, and their content is similar, but one written in Japanies and other written in English, I should never ever suggest them together because writtenInLanguage is CriticalPropertiy, so if the first book has a value for writtenInLanguage, the other book must have the same value for it as well. – Ania David Mar 03 '16 at 14:25
  • Kindly is what you mean by "filter not exist" **remove all instances that don't satisfy the graph patter inside the filter not exist clause** ? – Ania David Mar 04 '16 at 16:58
  • *"Kindly is what you mean by "filter not exist" remove all instances that don't satisfy the graph patter inside the filter not exist clause ?"* I don't understand what you're asking. All I'm saying is that it can help to approach the query from the perspective of "how can I exclude the results I don't want" rather than "how can I include the results I do want". E.g., suppose it were hard to check whether a number is even in SPARQL, but easy to check whether a number is odd. You want even numbers. Then you write `select ?x where { ?x a number . filter ( !isOdd(?x) ) }`. – Joshua Taylor Mar 04 '16 at 17:15