1

I have an initial array s0 which i replicate twice, giving s1 and s2. Now s1 and s2 diverged due to a concurrent insertion.

I know that s1 was obtained by inserting a character into s0 (at a random position). Similarly, s2 is derived from s0 and a random insertion. However, i don't have access to s0 anymore.

How can i combine s1 and s2 as if both additions happened on s0 on the respective positions.

As an example:

s0 = [1, 4, 9, 11]
s1 = [1, 4, 11, 9, 11]
s2 = [1, 3, 4, 9, 11]
-----
s3 = merge(s1, s2) = [1, 3, 4, 11, 9, 11]

(If both s1 and s2 were obtained by inserting a character at the same index, i don't matter what their order will be in s3, both are valid)


Since i know that s1 and s2 have exactly 2 differences, i start by searching for the index of the first difference.

var i = 0;
while (s1[i] === s2[i]) i++;

Now i know that the first difference is at index i, but i'm unable to figure out if s1[i] is the character that was added (and hence, must be added to s2 at index i) or if s2[i] is the character that was added.

Kevin
  • 2,813
  • 3
  • 20
  • 30
  • _"However, i don't have access to `s0` anymore"_ How is that possible? More importantly, how can you verify the result? – guest271314 Jan 23 '18 at 20:12
  • I thought that maybe since you know there are exactly two differences we can determine common parts and insert the characters between the common paths? Something like that :) – Kevin Jan 23 '18 at 20:12
  • @guest271314 This is in the context of research on eventual consistency, explaining everything would be far too complex. So just accept we don’t know the original state `s0`. – Kevin Jan 23 '18 at 20:15
  • Potential duplicate entries are going to give you lots of issues. Is there truly no way to store `s0` for later reference? – jmcgriz Jan 23 '18 at 20:15
  • 1
    @HyperZ No, it would not be "too complex". Are you trying to replicate how the new (compulsory for those that "accept") "chip" debit/credit cards function? That is, using quantum cryptography? – guest271314 Jan 23 '18 at 20:15
  • @guest271314 Ok, so the model i’m using is a multi value register (CRDT). So upon concurrent insertions i get the conflicting states (`s1` and `s2`) and i need to solve the conflict. The multi value register CRDT does not provide access to the original state on which the conflicts arise. – Kevin Jan 23 '18 at 20:17
  • @HyperZ If you're analyzing data that has already been collected and therefore have no way of knowing the original state, this problem makes sense. But if this is planning for an eventuality in a system that's being set up, then your best bet is to attach a database layer that tracks timestamped changes along the way to reference later. – jmcgriz Jan 23 '18 at 20:17
  • @HyperZ so, you're basically looking for a mergetool? Something like kDiff should work – jmcgriz Jan 23 '18 at 20:18
  • @jmcgriz yes i’m indeed searching for some merge tool that will combine both meaningfully, without dropping one of both insertions (i.e no last-writer wins policies). – Kevin Jan 23 '18 at 20:20
  • 1
    I don't think this question has really much to do with JavaScript. It would if you knew the algorithm and were having trouble implementing it, but this seems language independent. –  Jan 23 '18 at 20:20
  • _"which i replicate twice"_ Can you read the "state" before `s1` and `s2` are created? – guest271314 Jan 23 '18 at 20:23
  • @HyperZ See [Grabbing Edits from two strings](https://stackoverflow.com/questions/30688983/grabbing-edits-from-two-strings); also [Javascript fuzzy search that makes sense](https://stackoverflow.com/questions/23305000/javascript-fuzzy-search-that-makes-sense); though you are probably looking for [git](https://git-scm.com/) – guest271314 Jan 23 '18 at 20:54

2 Answers2

3

I would suggest this function. After finding the first index with a difference, it looks ahead to see whether the next values are consistent with either cause (i.e. an insertion in the first or second array respectively). This may not be evident from the next character, so a loop is needed to get clarity on that.

Once it is clear in which array the insertion took place, all information is there to build the final array:

function merge(s1, s2) {
    let i, j;
    for (i = 0; i < s1.length; i++) {
        if (s1[i] !== s2[i]) break;
    }
    for (j = i; j < s1.length; j++) {
        if (s1[j+1] !== s2[j]) return s2.slice(0, i + 1).concat(s1.slice(i));
        if (s1[j] !== s2[j+1]) return s1.slice(0, i + 1).concat(s2.slice(i));
    }
    // If we get here, both mutations were the "same"
    return s1[i].slice(); // ignore one.
}

const s1 = [1, 4, 11, 9, 11],
      s2 = [1, 3, 4, 9, 11];
console.log(merge(s1, s2));

Solution Not Always Unique

Sometimes there is more than one possible value for s3. For instance, if s1=[1, 2] and s2=[2, 1], there is no way to be sure whether s0=[1] or s0=[2].

If you want to detect such situations and get both solutions, then let the function return an array of solutions:

function merge(s1, s2) {
    let i, j, solutions = [];
    for (i = 0; i < s1.length; i++) {
        if (s1[i] !== s2[i]) break;
    }
    for (j = i; j < s1.length; j++) {
        if (s1[j+1] !== s2[j]) solutions.push(s2.slice(0, i + 1).concat(s1.slice(i)));
        if (s1[j] !== s2[j+1]) solutions.push(s1.slice(0, i + 1).concat(s2.slice(i)));
        if (solutions.length) return solutions;
    }
    // If we get here, both mutations were the "same"
    return [s1[i].slice()]; // ignore one.
}
const s1 = [1, 5, 4], 
      s2 = [1, 3, 4];
console.log(merge(s1, s2)); // two solutions
trincot
  • 317,000
  • 35
  • 244
  • 286
  • 1
    Could the downvoter explain the issue with this answer? – trincot Jan 23 '18 at 20:40
  • I didn't downvote, but consider the following: `merge([3, 1, 4, 9, 11], [1, 3, 4, 9, 11])` returns `[1,3,1,4,9,11]`, but swap the parameters and you get `[3,1,3,4,9,11]`. It may not be possible to determine which is correct without knowing `s0`. – Rick Hitchcock Jan 23 '18 at 20:45
  • 1
    @RickHitchcock, indeed, in that case there are two possibilities for `s0`. – trincot Jan 23 '18 at 20:50
  • First of all thank you for your effort! @RickHitchcock since both were inserted at the same position there are indeed two possibilities, both of which are correct. – Kevin Jan 23 '18 at 20:56
  • I have added a snippet to my answer that will return an array of solutions, which may have 1 or 2 elements (arrays). – trincot Jan 23 '18 at 20:59
1

One approach would be to go through them simultaneously. If they aren't the same value at a given index, do a look ahead on them both. If one has the same value in the look ahead as the other in the current position, it was probably an insertion on that one.

For example, looking through your examples above:

s1 = [1, 4, 11, 9, 11]
s2 = [1, 3, 4, 9, 11]

Index 0, both are the same. Index 1, they are different. Looking at index 2 of each, you see that s2 has the same value as the current index of s1, so s2 likely had an insertion, so add that before the value of s1. After that, line them back up again (so look at s2 one ahead`.

const s1 = [1, 4, 11, 9, 11];
const s2 = [1, 3, 4, 9, 11];
const s3 = [1, 3, 4, 11, 9, 11]; // for comparison

function merge(a, b) {
  let iA = 0;
  let iB = 0;
  let result = [];
  let abort = 0;
  
  while (abort < 100 && iA < a.length && iB < b.length) {
    if (a[iA] === b[iB]) {
      result.push(a[iA]);
      iA++;
      iB++;
    } else if (a[iA+1] === b[iB]) { // a had an insert
      result.push(a[iA]); 
      iA++;
    } else if (a[iA] === b[iB+1]) { // b had an insert
      result.push(b[iB]);
      iB++;
    } else {
      // extra stuff here
    }
    
    abort++; // this is just so no infinite loop in partially defined loop
  }
  
  // Catch values at the end
  if (iA < a.length) {
    result = result.concat(a.slice(iA));
  } else if (iB < b.length) {
    result = result.concat(b.slice(iB));
  }
    
  return result;
}

console.log(merge(s1, s2));

This code works for your particular example. Notice there is an extra else though. If you got there, it means an insertion happened in one, but either they both got an insertion at the same place, or one got multiple insertions. You'd have to loop through both and determine how to resolve, but the basic approach would be the same: loop through until you find where things line up again, and then take the ones that don't match before incrementing the iterators appropriately.


I just re-read your question a bit closer and realized that you know that only one operation is happening on each before the merge. That makes things quite a bit easier, since you don't have to loop for an arbitrary distance, you just have to go out two at most.

If you know they won't be in the same location, then you can just get rid of the else in my above code, because it'll never happen.

If they might be in the same position, since you know you only have two changes, if neither of the values match (and you know each only received one change), then both received a change at the same point. In that case, just take both. If they need a particular order in this case, you'll have to define some logic. In general though, probably doesn't matter, so just take either first.

function merge(a, b) {
  let iA = 0;
  let iB = 0;
  let result = [];
  
  while (iA < a.length && iB < b.length) {
    if (a[iA] === b[iB]) {
      result.push(a[iA]);
      iA++;
      iB++;
    } else if (a[iA+1] === b[iB]) { // a had an insert
      result.push(a[iA]); 
      iA++;
    } else if (a[iA] === b[iB+1]) { // b had an insert
      result.push(b[iB]);
      iB++;
    } else { // both received a change, take both
      result.push(a[iA]);
      result.push(b[iB]);
      iA++;
      iB++;
    }
  }
  
  // Catch values at the end
  if (iA < a.length) {
    result = result.concat(a.slice(iA));
  } else if (iB < b.length) {
    result = result.concat(b.slice(iB));
  }
    
  return result;
}

console.log(merge([1, 4, 11, 9, 11], [1, 3, 4, 9, 11])); // different places
console.log(merge([1, 3, 4, 9, 11], [1, 2, 4, 9, 11])); // same places
console.log(merge([1, 4, 9, 11, 13], [1, 3, 4, 9, 11]));
samanime
  • 25,408
  • 15
  • 90
  • 139