3

I have a dataset wherein I'm attempting to replace an individual that matches certain criteria with another individual. In the minimal example provided, I am looking to replace x-data://old with x-data://new.

Example input dataset:

<x-data://new> <x-dom://betterThan>  <x-data://old> .
<x-data://o0>  <x-data://p>          <x-data://old> .
<x-data://old> <x-data://p>          <x-data://o1> .

Example of desired output dataset:

<x-data://new> <x-dom://betterThan>  <x-data://new> .
<x-data://o0>  <x-data://p>          <x-data://new> .
<x-data://new> <x-data://p>          <x-data://o1> .

I have attempted to do this through the following query:

DELETE {
  ?s ?p ?o .
}
INSERT {
  ?ns ?p ?no .
}
WHERE
  { { SELECT  ?new ?old
      WHERE
        {
          ?new <x-dom://betterThan> ?old    .
          FILTER( !sameTerm( ?new, ?old ) ) .
        }
      LIMIT   1
    }
      { ?old ?p ?o
        BIND(?old AS ?s)
        BIND(?new AS ?ns)
        BIND(?o AS ?no)
      }
    UNION
      { ?s ?p ?old
        BIND(?old AS ?o)
        BIND(?s AS ?ns)
        BIND(?new AS ?no)
      }
  }

This query, however, does not insert any triples into the graph. It does delete all of the triples one would expect. According to Andy Seaborne on the Jena Dev list (when I erroneously flagged this as a bug):

?new is not in-scope at that point - it does not flow in from the sub-query higher up. Logically, each block is executed independently and the results combined up the tree. The {SELECT} is executed, the UNION is executed separately, then the results joined.

So ?ns is not defined and hence the INSERT on "?ns ?np ?no" is not a legal triple and is skipped (c.f. CONSTRUCT).

Try executing the WHERE part as a SELECT * query to see more.

This makes sense and executing the suggested SELECT query was illustrative:

-----------------------------------------------------------------------------------------------------------------------------
| new            | old            | p                    | o              | s              | ns             | no            |
=============================================================================================================================
| <x-data://new> | <x-data://old> | <x-data://p>         | <x-data://o1>  | <x-data://old> |                | <x-data://o1> |
| <x-data://new> | <x-data://old> | <x-dom://betterThan> | <x-data://old> | <x-data://new> | <x-data://new> |               |
| <x-data://new> | <x-data://old> | <x-data://p>         | <x-data://old> | <x-data://o0>  | <x-data://o0>  |               |
----------------------------------------------------------------------------------------------------------------------------

In light of this, I'd like to restructure the query above to provide the desired replacement transformation. Though this smelled of a common use-case, I haven't been successful finding an existing query for a replacement operation.

EDIT July 11, 2014

This Answer to the same question almost satisfies this, but needs to be restructured to be in the form of a DELETE-INSERT query.

Rob Hall
  • 2,693
  • 16
  • 22
  • 1
    What's the purpose of matching `?new` and `?old` in a sub-`select`? Is it just to limit yourself to one result? E.g., why can't you just use `?new :betterThan ?old . { … } union { … }`? – Joshua Taylor Apr 12 '14 at 19:27
  • 1
    Andy's answer on [this answers.semanticweb.com question](http://answers.semanticweb.com/questions/24345/sparql-query-to-delete-uri-as-subject-pred-or-object) mentions that you can combine multiple operations. I wonder if something with that technique can be applied here… – Joshua Taylor Apr 12 '14 at 19:30
  • @JoshuaTaylor If I don't limit it to one result, then I believe that cases where there are chains of `?newest :betterThan ?old. ?old :betterThan ?oldest.` will result in inconsistencies because all allowable bindings are generated before the query executes. For example, replacing `?old` with `?new`, then replacing `?oldest` with `?old` results in there no longer existing a link from `?new` to `?old` and not being able to mend the document. – Rob Hall Apr 14 '14 at 13:59
  • @JoshuaTaylor regarding [Andy's Answer](http://answers.semanticweb.com/questions/24345/sparql-query-to-delete-uri-as-subject-pred-or-object/24358), I don't believe that I could propagate my select criteria through all three queries without leaving SPARQL and doing some Jena-Fu. While that is an acceptable workaround (this post _is_ tagged Jena), I'd like to lean away from that, if possible. – Rob Hall Apr 14 '14 at 14:05
  • 1
    OK, but if there is more than one `:betterThan` triple, how do you know that you're getting the one that you want? There's no ordering condition or anything in your subselect at the moment… – Joshua Taylor Apr 14 '14 at 14:11
  • @JoshuaTaylor at the moment, if any single element that satisfies that clause were picked individually, then the desired rename would not cause any invalidation within the document. For example, regardless of which (valid) pair were picked (in my previous comment), then a chain would exist that would still allow `?newestRemaining :betterThan ?oldestRemaining` to bind. I'm fine with running this query until there are no longer candidates for replacement. – Rob Hall Apr 14 '14 at 14:24
  • 1
    Are you sure? If you have `3 > 2 . 2 > 1` (where `>` is `:betterThan`) and you pick `2 > 1` first, then don't you replace `2 > 1` with `2 > 2`, and then you could simply be iterating on `2 > 2` endlessly? – Joshua Taylor Apr 14 '14 at 14:35
  • 1
    As a quite possibly poorly-performing option, could you do the two other queries as sub-selects as well, and thus join on the `?old` and `?new` that you choose with the first subselect? – Joshua Taylor Apr 14 '14 at 14:36
  • @JoshuaTaylor we restrict the query so that `FILTER( !sameTerm( ?nodeReplacing, ?nodeBeingReplaced) )` in practice. This is a good catch in terms of oversight in the example. I'll edit it to match. – Rob Hall Apr 14 '14 at 14:38
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/50621/discussion-between-joshua-taylor-and-rob-hall) – Joshua Taylor Apr 14 '14 at 14:40
  • @JoshuaTaylor I believe that the different operations of such an aggregate UPDATE query are treated as separate queries, so that once some of the destructive operations from a single query took place, the criteria to match the node of interest would no longer hold. Additionally, we wouldn't have certainty that the engine matched the same set of bindings for each query (and thus operate on different nodes at different times). – Rob Hall Apr 14 '14 at 14:43
  • Did you make any progress on this? – Joshua Taylor Jun 12 '14 at 02:08
  • This is still an open question. The Jena-specific workaround so far has been to select a single set of candidate elements for replacement, use the Jena API to add/remove graph statements, and then loop until there are no results selected as candidates for replacement. – Rob Hall Jun 17 '14 at 13:47
  • Would the trivial modification that turns `construct { ?ss ?p ?oo } where { ?s ?p ?o # more stuff }` into `delete { ?s ?p ?o } insert { ?ss ?p ?oo } where { ?s ?p ?o # more stuff }` work? It touches *every* triple, so it's not wonderfully efficient, but if most triples need to be modified, it's not that bad. – Joshua Taylor Jul 11 '14 at 15:49

1 Answers1

1

Here's a modification of this answers.semanticweb.com answer that may work for you. The idea is to touch every triple in the graph, and get a new subject and new object for it as they're available, but leaving them untouched otherwise. It does have the unfortunate side effect of touching every triple in the graph, deleting it, and inserting either an updated version or the same thing, so you are doing some unnecessary work. I suppose that you could get around this by binding a boolean in each optional (e.g., bind(true as ?replacedSubject) and then adding an outermost filter that checks filter ( ?replacedSubject || ?replacedObject ). Then you'd only match those triples where either the subject or object needed to be replaced, and you wouldn't do the useless work.

delete {  ?s ?p  ?o }
insert { ?ss ?p ?oo }
where {
  ?s ?p ?o 

  optional {
    select ?s (sample(?new) as ?news) where {
      ?new <x-dom://betterThan> ?s 
      filter( !sameTerm( ?new, ?s ) )
    }
    group by ?s
  }

  optional {
    select ?o (sample(?new) as ?newo) where {
      ?new <x-dom://betterThan> ?o 
      filter( !sameTerm( ?new, ?o ) )
    }
    group by ?o
  }

  bind( coalesce(?news,?s) as ?ss )
  bind( coalesce(?newo,?o) as ?oo )
}
Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353