9

I see a lot of posts about how to get the difference and symmetric difference of an array in javascript, but I haven't found anything on how to find the difference, including duplicates.

For example:

let original = [1];
let updated = [1, 1, 2];

difference(updated, original);
// Expect: [1, 2]

Is there an elegant way to do this? I'm open to solutions using plain javascript or lodash.

Thanks!

UPDATE

To clarify, an infinite number of duplicates should be supported. Another example:

let original = [1, 1];
let updated = [1, 1, 1, 1, 1, 2];

difference(updated, original);
// Expect: [1, 1, 1, 2]

UPDATE 2

I realized that there may be some confusion on the original requirements. It is true that infinite duplicates should be supported, but the order should not affect the output.

Example:

let original = [1, 1, 2];
let updated = [1, 2, 1, 1, 1];

difference(updated, original);
// Expect: [1, 1]
jdixon04
  • 1,435
  • 16
  • 28
  • `_.difference` doesn't respect duplicates and would result in `[2]` rather than `[1,2]` – jdixon04 Oct 01 '16 at 19:33
  • 1
    what do you expect in this case `let original = [1, 5]; let updated = [1, 1, 2];`? – suraj.tripathi Oct 01 '16 at 19:40
  • 1
    or [1,1,1,1] vs [1,2] ... how many `1` for duplicate? rules need more clarity – charlietfl Oct 01 '16 at 19:43
  • Thanks for your input. See above for an update that should answer your question. – jdixon04 Oct 01 '16 at 19:48
  • 1
    What about the order of the output? Any requirements? What would be the output of `difference([1, 2, 3, 3, 1], [3, 2, 1, 2])`? – trincot Oct 01 '16 at 19:55
  • 1
    Must be guaranteed that `difference(a, b)` gives the same result as `difference(b, a)` for any arrays `a` and `b`? – trincot Oct 01 '16 at 20:00
  • @trincot order of the output isn't important in this case. I would expect that the difference between the two arrays would be the same, but if it makes the solution for complex to ensure that the order of the parameters, then I can live with it. – jdixon04 Oct 01 '16 at 20:09
  • @jdixon04 Why in **UPDATE 2** the difference between `[1, 1, 2]` and `[1, 2, 1, 1, 1]` is `[1, 1]`? It's missing a 2, is not it? –  Oct 01 '16 at 20:13
  • your question is still confusing, what will be the output if `let original = [1, 1, 1, 1, 2]; let updated = [1, 2, 1, 1, 3];` – suraj.tripathi Oct 01 '16 at 20:13
  • @TheProHands The difference is `[1,1]` because `updated` already contains all the values in `original`, which leaves you with `[1,1]`. It's easier to see if you rearrange `updated` to be `[1,1,2,1,1]` and then cancel out anything in `original` – jdixon04 Oct 01 '16 at 20:16
  • in this case `let original = [1, 1, 1, 1, 2]; let updated = [1, 2, 1, 1, 3];` which one will be your answer `[1]` or `[1, 3]`? – suraj.tripathi Oct 01 '16 at 20:19
  • @suraj99934 `[1, 3]` – jdixon04 Oct 01 '16 at 20:19

6 Answers6

6

I would suggest this solution, which avoids a time complexity of O(n²):

function difference(a, b) {
    return [...b.reduce( (acc, v) => acc.set(v, (acc.get(v) || 0) - 1),
            a.reduce( (acc, v) => acc.set(v, (acc.get(v) || 0) + 1), new Map() ) 
    )].reduce( (acc, [v, count]) => acc.concat(Array(Math.abs(count)).fill(v)), [] );
}

let original = [1, 1];
let updated = [1, 1, 1, 1, 1, 2];
let res = difference(updated, original);

console.log(res);

Explanation

This solution creates a Map with a key for every distinct value of the first array (a), and as value the count of occurrences of each. Then b is added to that Map in the same way, except that the count of occurrences counts negative. If that count ends up being zero, then of course this key should not end up in the final result. In fact, the number of occurrences in the final result is the absolute value of the count in the Map for each of its keys.

Details

The code starts with:

new Map()

It is the initial value of the accumulator of the inner reduce. That reduce iterates over a and updates the count of the corresponding key in the Map. The final result of this reduce is thus a Map.

This Map then becomes the initial accumulator value for the outer reduce. That reduce iterates over b and decreases the count in the Map.

This updated Map is spread into an array with the spread operator. This array consists of 2-element sub-arrays, which are key/value pairs. Note that the value in this case is a count which could be positive, zero or negative.

This array is then iterated with the final reduce. Each count is used to create an array of that many elements (in absolute value) of the corresponding value. All this is concatenated to one array, being the return value of the function.

Follow-up Question

In comments you explained you actually needed something different, where the role of both arrays is not the same. The first array should be returned, but with the elements from the second array removed from it.

You could use this code for that:

function difference2(a, b) {
    return a.filter(function(v) {
        return !this.get(v) || !this.set(v, this.get(v) - 1);
    }, b.reduce( (acc, v) => acc.set(v, (acc.get(v) || 0) + 1), new Map() ));
}

let original = [1, 1, 2];
let updated = [1, 1];
let res = difference2(original, updated);

console.log(res);
trincot
  • 317,000
  • 35
  • 244
  • 286
  • Excellent answer and explanation. Of all the answers, only this solution and @StartdustGogeta answer worked correctly on various examples. Is there any advantage to your solution? – jdixon04 Oct 01 '16 at 20:31
  • For larger input this solution will be faster. StardustGogeta's solution will for each value, in both of the arrays, go through both arrays to find a count. This means it performs n² operations (n being the length of both arrays together). My solution will have time complexity in the order of *n* only, as it benefits from the fast `Map` operations. – trincot Oct 01 '16 at 20:35
  • Thanks for the comparison. It's been a while since I've seen anything related to time complexity, so truly appreciate the detailed overview! – jdixon04 Oct 01 '16 at 20:37
  • @trincot Nice solution! I will try timing each of ours with large inputs to show this effect, and add it to my answer. – StardustGogeta Oct 01 '16 at 20:37
  • I think this might deserve a new question as it touches on the essence. Can you explain this a bit more? Why is the value 2 not in the result? – trincot Oct 01 '16 at 20:51
  • For context, these arrays are storing `id`s. One contains the new values and one contains the original values. When I make the comparison, I need to say whether or not I'm adding records with the difference of the arrays or whether or not I'm deleting records with the difference of these two arrays. – jdixon04 Oct 01 '16 at 20:52
  • If they are ID values, why do you consider duplicates? And if you need to know what to delete, and what to add, you really need two results, not one, right? – trincot Oct 01 '16 at 20:54
  • The previous comment with the example was incorrect. Here's an updated example: `arr1 = [1, 1, 2], arr2 = [1, 1], diff(arr1, arr2) = [2], diff(arr2, arr1) = []` Essentially, I'm only want values that are different with respect to the first array. The behavior is the same of lodash's `_.difference` with the exception that duplicates are supported. For example, `arr1 = [1,2], arr2 = [1, 2, 3], _.difference(arr1, arr2) = [], _.difference(arr2, arr1) = [3]` – jdixon04 Oct 01 '16 at 20:57
  • Regarding ID values and why I need duplicates, the short answer is that I'm supporting a belongsToMany relationship using arrays rather than a join table. There's a better explanation, but it's quite long :) – jdixon04 Oct 01 '16 at 20:59
  • 1
    I updated my answer with an extra section and script for that. Next time though, you should ask such a follow-up question as a new question. The two questions were not the same. But no problem for this time :) – trincot Oct 01 '16 at 21:18
  • @trincot I just finished updating my answer with a speed test, and yours is far superior. :) However, I managed to crash the tab three times and lose all my progress due to my function's slowness. :D – StardustGogeta Oct 01 '16 at 21:19
  • 1
    Ah, nice job, @StardustGogeta -- not the crashing, but the tests of course :D – trincot Oct 01 '16 at 21:24
  • @trincot thank you! More than happy to post a follow-up question and you can post your answer. Sorry, didn't realize the requirement until you had answered! – jdixon04 Oct 01 '16 at 22:18
  • Is there any chance you could expand this to include to check array of objects e.g. `[{name: 'foo'}, {value: 'foo'}, {value: 'bar'}] vs [{name: 'test'}, {value: 'foo'}, {value: 'bar'}]` – MonteCristo Mar 02 '20 at 19:10
  • @MonteCristo, if you can't make it work, then you better post this as a new question. – trincot Mar 02 '20 at 20:00
1

function count(n,arr) {
  return arr.filter(a=>a==n).length
}

function diffBetween(arr,arr2) {
  diff = [];
  new Set(arr.concat(arr2)).forEach(
  a => {
    for(x=0;x<Math.abs(count(a,arr)-count(a,arr2));x++)
      diff.push(a)
    }
  );
  return diff;
}

console.log(diffBetween([1],[1,1,2]));
console.log(diffBetween([1,1],[1,1,1,1,1,2]));
console.log(diffBetween([1,1,3,4],[1,2,3]));

How does this work for you?

EDIT:

function difference(a, b) { // trincot's code
    return [...b.reduce( (acc, v) => acc.set(v, (acc.get(v) || 0) - 1),
            a.reduce( (acc, v) => acc.set(v, (acc.get(v) || 0) + 1), new Map() ) 
    )].reduce( (acc, [v, count]) => acc.concat(Array(Math.abs(count)).fill(v)), [] );
}

function count(n,arr) { // My code
  return arr.filter(a=>a==n).length
}

function diffBetween(arr,arr2) { // My code
  diff = [];
  new Set(arr.concat(arr2)).forEach(
  a => {
    for(x=0;x<Math.abs(count(a,arr)-count(a,arr2));x++)
      diff.push(a)
    }
  );
  return diff;
}

in1 = [1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2,2];
in2 = [1,2,3,4,5,6,1,2,3,4,5,6,7,1,1,1,1,1,1,2,2,2,2,2,2,2];

start = (new Date).getTime();
a = difference(in1,in2);
end = (new Date).getTime();
console.log("trincot done",end-start,"msec");
start = (new Date).getTime();
a = diffBetween(in1,in2);
end = (new Date).getTime();
console.log("stardust done",end-start,"msec");

@trincot's solution above is consistently faster in my testing, so his is clearly superior with large enough datasets.

StardustGogeta
  • 3,331
  • 2
  • 18
  • 32
1

So, I'd:

  • Iterate the updated array, for each element check if it's present on original array, if it's present I remove it from original array (note: in the function below I copy the original object, so I don't affect it), else I push the element to the differences array. At the end, I return the differences array.

This code is made to work on various browsers, thus I didn't use Array().indexOf and other newer methods of ECMAScript.

function difference(updated, original) {
  var i, l;
  /* copy original array */
  var degradation = [];
  for (var i = 0, ol = original.length; i < ol; ++i)
    degradation[i] = original[i]

  var diff = [];
  for (i = 0, l = Math.max(updated.length, ol); i < l; ++i) {
    var upd = updated[i];
    var index;
    var b, found;
    /* find updated item in degradation */
    for (b = 0, found = false; b < ol; ++b) {
      if (degradation[b] === upd) {
        /* remove item from degradation */
        delete degradation[b];
        found = true;
        break;
      }
    }
    if (!found)
      diff.push(upd);
  }
  return diff;
}
0
    Array.prototype.Diff = function( secondArray ) {
    var mergedArray = this.concat( secondArray );
    var mergedString = mergedArray.toString();
    var finalArray = new Array();

    for( var i = 0; i < mergedArray.length; i++ ) {
        if(mergedString.match(mergedArray[i])) {
            finalArray.push(mergedArray[i]);
            mergedString = mergedString.replace(new RegExp(mergedArray[i], "g"), "");
        }
    }
    return finalArray;
}

var let = [ 1 ];
var updated = [ 1, 1, 2 ];

console.log(let.Diff(updated));

I like the prototype way. You can save the prototype function above in a JS file and import in any page that you want, the it's possible to use as an embedded function to the object (Array for this case).

Fabricio
  • 76
  • 3
0

You might do as follows;

var original = [1, 1, 1, 1, 2],
     updated = [1, 2, 1, 1, 3],
      result = (...a) => { var [shorter,longer] = [a[0],a[1]].sort((a,b) => a.length - b.length),
                                              s = shorter.slice();
                           return shorter.reduce((p,c) => { var fil = p.indexOf(c),
                                                                fis = s.indexOf(c);
                                                            fil !== -1 && (p.splice(fil,1),s.splice(fis,1));
                                                            return p;
                                                          },longer).concat(s);
                         };
console.log(result(updated,original));
Redu
  • 25,060
  • 6
  • 56
  • 76
0

You can do it the following steps (O(n)).

Let a and b are two arrays

Step 1. create map hash_map of array a value as key and number occurrences of this key as value.

Step 2. push all the elements of array b in result which are not in a using hash_map.

Step 3. push all the elements of array a in result which are not in b using hash_map.

Here is complete code

function diff(a, b) {
    //Step 1 starts here
 var hash_map = a.reduce(function(map, key) {
  map[key] = map[key] ? (map[key]+1) : 1;
  return map;
 }, {});
    //Step 1 ends here
    //Step 2 starts here
 var result = b.filter(function(val) {
  if(hash_map[val]) {
   hash_map[val] = hash_map[val]-1;
   return false;
  }
  return true;
 });
    //Step 2 ends hers
    //Step 3 starts here
 Object.keys(hash_map).forEach(function(key) {
  while (hash_map[key]) {
   result.push(key);
   hash_map[key] = hash_map[key]-1;
  }
 });
    //Step 3 ends here
 return result;
}

console.log(diff([1],[1,1,2]));
console.log(diff([1,1,1],[1,1,1,1,1,2]));
console.log(diff([1,1,3,4],[1,2,3]));
console.log(diff([1,1,1,1,1,2], [1, 2, 1, 1, 3]));
suraj.tripathi
  • 417
  • 2
  • 15