180

Let A and B be two sets. I'm looking for really fast or elegant ways to compute the set difference (A - B or A \B, depending on your preference) between them. The two sets are stored and manipulated as Javascript arrays, as the title says.

Notes:

  • Gecko-specific tricks are okay
  • I'd prefer sticking to native functions (but I am open to a lightweight library if it's way faster)
  • I've seen, but not tested, JS.Set (see previous point)

Edit: I noticed a comment about sets containing duplicate elements. When I say "set" I'm referring to the mathematical definition, which means (among other things) that they do not contain duplicate elements.

Matt Ball
  • 354,903
  • 100
  • 647
  • 710
  • What is this "set difference" terminology you are using? Is that from C++ or something? – Josh Stodola Nov 12 '09 at 15:44
  • What are in your sets? Depending on the type you are targetting (eg Numbers), computing a set difference can be done *really* fast and elegant. If your sets contain (say) DOM elements, you're going to be stuck with a slow `indexOf` implementation. – Crescent Fresh Nov 12 '09 at 15:53
  • @Crescent: my sets contain numbers - sorry for not specifying. @Josh: it's the standard set operation in mathematics (http://en.wikipedia.org/wiki/Set_%28mathematics%29#Complements) – Matt Ball Nov 12 '09 at 16:30
  • @JoshStodola that's the [mathematical notation for set difference](http://en.wikipedia.org/wiki/Set_(mathematics)#Complements) – Pat Aug 12 '14 at 00:21
  • @Pat you must missed that this question is approaching 4 years old. – Matt Ball Aug 12 '14 at 03:29
  • 1
    @MattBall Nope, I saw that. But Josh's question was valid and unanswered so I answered it :) – Pat Aug 14 '14 at 21:19
  • please look at http://stackoverflow.com/a/40369164/6011421 – Yurii Holskyi Nov 01 '16 at 22:04
  • Possible duplicate of [JavaScript array difference](https://stackoverflow.com/questions/1187518/javascript-array-difference) – user Sep 22 '17 at 01:58
  • TC39 proposal is currently Stage 2: https://github.com/tc39/proposal-set-methods – Jared Beck Jan 19 '21 at 06:04

15 Answers15

227

I don't know if this is most effective, but perhaps the shortest:

var A = [1, 2, 3, 4];
var B = [1, 3, 4, 7];

var diff = A.filter(function(x) {
  return B.indexOf(x) < 0;
});

console.log(diff); // [2]

Updated to ES6:

const A = [1, 2, 3, 4];
const B = [1, 3, 4, 7];

const diff = A.filter(x => !B.includes(x));

console.log(diff); // [2]
Mr. Polywhirl
  • 42,981
  • 12
  • 84
  • 132
user187291
  • 53,363
  • 19
  • 95
  • 127
  • 11
    +1: not the most efficient solution, but definitely short and readable – Christoph Nov 12 '09 at 17:37
  • 10
    Note: array.filter is not supported cross-browser (e.g. not in IE). It seems not to matter to @Matt since he stated that "Gecko-specific tricks are okay" but I think it's worth mentioning. – Eric Bréchemier Nov 13 '09 at 16:44
  • 63
    This is very slow. O(|A| * |B|) – glebm Apr 08 '13 at 00:37
  • 1
    @EricBréchemier This is now supported (since IE 9). [Array.prototype.filter](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/filter) is a standard ECMAScript feature. – Quentin Roy Jun 21 '16 at 08:14
  • what is "ES6" ? I found this syntax don't work on safari 9.1 (on 10.0 its ok) – Paolo Biavati Nov 14 '16 at 14:26
  • 1
    @PaoloBiavati ES6 is [ECMAScript](https://en.wikipedia.org/wiki/ECMAScript) 6th edition. It's the standardization of JavaScript. The 6th edition was published in June 2015. I'm not familiar with Safari's versions, but it's likely version 9.1 didn't support the syntactic sugar form of lambda functions (the "=>" operator) whereas version 10 does. – Cole Cameron Nov 16 '16 at 22:02
  • it's working for more than one itme also . A = [200000340755,200000340767, 200000340760]; B = [200000340755]; diff = A.filter(function(x) { return B.indexOf(x) < 0 }) console.log(diff); – gnganapath Mar 17 '17 at 09:25
  • 6
    In ES6, you could use [`!B.includes(x)`](https://developer.mozilla.org/en/docs/Web/JavaScript/Reference/Global_Objects/Array/includes) instead of `B.indexOf(x) < 0` :) – c24w Jun 26 '17 at 11:27
  • This doesn't work when computing difference of arrays containing objects - ie: `[{a: 5}, {b: 6}].filter(x => [{a: 5}].indexOf(x) < 0 )` gives `[{a: 5}, {b: 6}]` – eithed Nov 21 '17 at 12:55
  • @glebm That would only be true if `indexOf` would run a linear search, but benchmarks suggest that it's somehow indexed. It's pretty much as fast as a key lookup in an object. – AndreKR May 21 '18 at 10:49
  • 1
    @glebm I don't think this kind of problem can be solved in linear time at all, it will always be polynomial, unless you solved the P vs NP problem. – Wong Jia Hau Jun 27 '19 at 09:25
  • 2
    @WongJiaHau This can be done in amortized `O(|A| + |B|)` with hash sets. – glebm Sep 15 '19 at 19:19
  • This works for integers but will fail for strings. `'himom'.includes('hi') === true` – tommmm Jan 27 '23 at 22:34
177

Well, 7 years later, with ES6's Set object it's quite easy (but still not as compact as python's A - B), and reportedly faster than indexOf for large arrays:

console.clear();

let a = new Set([1, 2, 3, 4]);
let b = new Set([5, 4, 3, 2]);

let a_minus_b = new Set([...a].filter(x => !b.has(x)));
let b_minus_a = new Set([...b].filter(x => !a.has(x)));
let a_intersect_b = new Set([...a].filter(x => b.has(x))); 
let a_union_b = new Set([...a, ...b]); 

console.log(...a_minus_b);     // {1}
console.log(...b_minus_a);     // {5}
console.log(...a_intersect_b); // {2,3,4}
console.log(...a_union_b);     // {1,2,3,4,5}
Mr. Polywhirl
  • 42,981
  • 12
  • 84
  • 132
milan
  • 11,872
  • 3
  • 42
  • 49
20

Looking at a lof of these solutions, they do fine for small cases. But, when you blow them up to a million items, the time complexity starts getting silly.

 A.filter(v => B.includes(v))

That starts looking like an O(N^2) solution. Since there is an O(N) solution, let's use it, you can easily modify to not be a generator if you're not up to date on your JS runtime.

    function *setMinus(A, B) {
      const setA = new Set(A);
      const setB = new Set(B);

      for (const v of setB.values()) {
        if (!setA.delete(v)) {
            yield v;
        }
      }

      for (const v of setA.values()) {
        yield v;
      }
    }

    a = [1,2,3];
    b = [2,3,4];

    console.log(Array.from(setMinus(a, b)));

While this is a bit more complex than many of the other solutions, when you have large lists this will be far faster.

Let's take a quick look at the performance difference, running it on a set of 1,000,000 random integers between 0...10,000 we see the following performance results.

setMinus time =  181 ms
    diff time =  19099 ms

function buildList(count, range) {
  result = [];
  for (i = 0; i < count; i++) {
    result.push(Math.floor(Math.random() * range))
  }
  return result;
}

function *setMinus(A, B) {
  const setA = new Set(A);
  const setB = new Set(B);

  for (const v of setB.values()) {
    if (!setA.delete(v)) {
        yield v;
    }
  }

  for (const v of setA.values()) {
    yield v;
  }
}

function doDiff(A, B) {
  return A.filter(function(x) { return B.indexOf(x) < 0 })
}

const listA = buildList(100_000, 100_000_000); 
const listB = buildList(100_000, 100_000_000); 

let t0 = process.hrtime.bigint()

const _x = Array.from(setMinus(listA, listB))

let t1 = process.hrtime.bigint()

const _y = doDiff(listA, listB)

let t2 = process.hrtime.bigint()

console.log("setMinus time = ", (t1 - t0) / 1_000_000n, "ms");
console.log("diff time = ", (t2 - t1) / 1_000_000n, "ms");
koblas
  • 25,410
  • 6
  • 39
  • 49
  • @RonKlein fair point, updated the code to be two sets – koblas Sep 06 '21 at 10:44
  • Note that if you have a really large Set, you're limited to 2^24 items (16,777,216) and it will crash. Otherwise, this is a great solution! – Jordan Oct 28 '22 at 15:56
  • @koblas your second code snippet is broken and throws a ReferenceError yet you reverted an edit I made fixing said issue – David Tejuosho Feb 14 '23 at 06:17
  • `process.hrtime.bigint()` is a nodejs function, not a browser function and will not work in the "run code snippet" engine. The difference is in nanosecond vs millisecond timing. – koblas Feb 14 '23 at 14:36
  • That setMinus() function doesn't looks correct to be a A-B diff. Elements of B not in A should not be returned. Implemented as it is, it's a set xor rather than a diff. – Timothée Groleau Jul 19 '23 at 01:07
15

If you're using Sets, it can be quite simple and performant:

function setDifference(a, b) {
  return new Set(Array.from(a).filter(item => !b.has(item)));
}

Since Sets use Hash functions* under the hood, the has function is much faster than indexOf (this matters if you have, say, more than 100 items).

Andrew Schwartz
  • 4,440
  • 3
  • 25
  • 58
14

You can use an object as a map to avoid linearly scanning B for each element of A as in user187291's answer:

function setMinus(A, B) {
    var map = {}, C = [];

    for(var i = B.length; i--; )
        map[B[i].toSource()] = null; // any other value would do

    for(var i = A.length; i--; ) {
        if(!map.hasOwnProperty(A[i].toSource()))
            C.push(A[i]);
    }

    return C;
}

The non-standard toSource() method is used to get unique property names; if all elements already have unique string representations (as is the case with numbers), you can speed up the code by dropping the toSource() invocations.

Community
  • 1
  • 1
Christoph
  • 164,997
  • 36
  • 182
  • 240
8

The shortest, using jQuery, is:

var A = [1, 2, 3, 4];
var B = [1, 3, 4, 7];

var diff = $(A).not(B);

console.log(diff.toArray());
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
Quentin Roy
  • 7,677
  • 2
  • 32
  • 50
perhelion
  • 132
  • 1
  • 8
  • This returns an object of the difference. – Drew Baker Sep 10 '15 at 18:31
  • 2
    jQuery `not` no longer works with generic objects as of 3.0.0-rc1. See https://github.com/jquery/jquery/issues/3147 – Marc-André Lafortune Jun 09 '16 at 15:38
  • 3
    It's not a great idea to add a dependency on a ~70k 3rd party library **just** to do this, since the same thing can be accomplished in just a few lines of code as shown in the other answers here. However, if you are already using jQuery on your project this will work just fine. – Chris Barr Dec 05 '16 at 14:24
  • Though this approach has less code, but it does not provide any explanation of the space and time complexity of the the differ algorithms and the data structure it uses to perform the method. It is black boxed for developers to engineer the software with no evaluation when data scale up or with limited memory is allowed. if you use such approach with large data set, the performance might remains unknown until further research onto the source code. – Downhillski Jan 18 '17 at 03:24
  • This is just returning the amount (2 in this case) of elements of A which are not in B. Converting 2 into array is pointless... – Alex Mar 02 '18 at 12:29
6

I would hash the array B, then keep values from the array A not present in B:

function getHash(array){
  // Hash an array into a set of properties
  //
  // params:
  //   array - (array) (!nil) the array to hash
  //
  // return: (object)
  //   hash object with one property set to true for each value in the array

  var hash = {};
  for (var i=0; i<array.length; i++){
    hash[ array[i] ] = true;
  }
  return hash;
}

function getDifference(a, b){
  // compute the difference a\b
  //
  // params:
  //   a - (array) (!nil) first array as a set of values (no duplicates)
  //   b - (array) (!nil) second array as a set of values (no duplicates)
  //
  // return: (array)
  //   the set of values (no duplicates) in array a and not in b, 
  //   listed in the same order as in array a.

  var hash = getHash(b);
  var diff = [];
  for (var i=0; i<a.length; i++){
    var value = a[i];
    if ( !hash[value]){
      diff.push(value);
    }
  }
  return diff;
}
Eric Bréchemier
  • 1,908
  • 3
  • 18
  • 30
  • that's exactly the same algorithm I posted half an hour ago – Christoph Nov 12 '09 at 17:15
  • @Christoph: you are right... I failed to notice that. I find my implementation more simple to understand though :) – Eric Bréchemier Nov 13 '09 at 16:41
  • I think it is better to compute the diff outside of getDifference so it may be reused multiple times. Maybe optional like so: `getDifference(a, b, hashOfB)`, if not passed it will be computed otherwise it is reused as-is. – Christophe Roussy Apr 12 '17 at 11:54
6

Using Underscore.js (Library for functional JS)

>>> var foo = [1,2,3]
>>> var bar = [1,2,4]
>>> _.difference(foo, bar);
[4]
chribsen
  • 6,232
  • 4
  • 26
  • 24
6

Some simple functions, borrowing from @milan's answer:

const setDifference = (a, b) => new Set([...a].filter(x => !b.has(x)));
const setIntersection = (a, b) => new Set([...a].filter(x => b.has(x)));
const setUnion = (a, b) => new Set([...a, ...b]);

Usage:

const a = new Set([1, 2]);
const b = new Set([2, 3]);

setDifference(a, b); // Set { 1 }
setIntersection(a, b); // Set { 2 }
setUnion(a, b); // Set { 1, 2, 3 }
Brian Burns
  • 20,575
  • 8
  • 83
  • 77
5

Incorporating the idea from Christoph and assuming a couple of non-standard iteration methods on arrays and objects/hashes (each and friends), we can get set difference, union and intersection in linear time in about 20 lines total:

var setOPs = {
  minusAB : function (a, b) {
    var h = {};
    b.each(function (v) { h[v] = true; });
    return a.filter(function (v) { return !h.hasOwnProperty(v); });
  },
  unionAB : function (a, b) {
    var h = {}, f = function (v) { h[v] = true; };
    a.each(f);
    b.each(f);
    return myUtils.keys(h);
  },
  intersectAB : function (a, b) {
    var h = {};
    a.each(function (v) { h[v] = 1; });
    b.each(function (v) { h[v] = (h[v] || 0) + 1; });
    var fnSel = function (v, count) { return count > 1; };
    var fnVal = function (v, c) { return v; };
    return myUtils.select(h, fnSel, fnVal);
  }
};

This assumes that each and filter are defined for arrays, and that we have two utility methods:

  • myUtils.keys(hash): returns an array with the keys of the hash

  • myUtils.select(hash, fnSelector, fnEvaluator): returns an array with the results of calling fnEvaluator on the key/value pairs for which fnSelector returns true.

The select() is loosely inspired by Common Lisp, and is merely filter() and map() rolled into one. (It would be better to have them defined on Object.prototype, but doing so wrecks havoc with jQuery, so I settled for static utility methods.)

Performance: Testing with

var a = [], b = [];
for (var i = 100000; i--; ) {
  if (i % 2 !== 0) a.push(i);
  if (i % 3 !== 0) b.push(i);
}

gives two sets with 50,000 and 66,666 elements. With these values A-B takes about 75ms, while union and intersection are about 150ms each. (Mac Safari 4.0, using Javascript Date for timing.)

I think that's decent payoff for 20 lines of code.

user187291
  • 53,363
  • 19
  • 95
  • 127
j-g-faustus
  • 8,801
  • 4
  • 32
  • 42
  • 1
    you should still check `hasOwnProperty()` even if the elements are numeric: otherwise, something like `Object.prototype[42] = true;` means `42` can never occur in the result set – Christoph Nov 12 '09 at 17:31
  • Granted that it would be possible to set 42 in that way, but is there a semi-realistic use case where anyone would actually do so? But for general strings I take the point - it could easily conflict with some Object.prototype variable or function. – j-g-faustus Nov 12 '09 at 18:54
4

As for the fasted way, this isn't so elegant but I've run some tests to be sure. Loading one array as an object is far faster to process in large quantities:

var t, a, b, c, objA;

    // Fill some arrays to compare
a = Array(30000).fill(0).map(function(v,i) {
    return i.toFixed();
});
b = Array(20000).fill(0).map(function(v,i) {
    return (i*2).toFixed();
});

    // Simple indexOf inside filter
t = Date.now();
c = b.filter(function(v) { return a.indexOf(v) < 0; });
console.log('completed indexOf in %j ms with result %j length', Date.now() - t, c.length);

    // Load `a` as Object `A` first to avoid indexOf in filter
t = Date.now();
objA = {};
a.forEach(function(v) { objA[v] = true; });
c = b.filter(function(v) { return !objA[v]; });
console.log('completed Object in %j ms with result %j length', Date.now() - t, c.length);

Results:

completed indexOf in 1219 ms with result 5000 length
completed Object in 8 ms with result 5000 length

However, this works with strings only. If you plan to compare numbered sets you'll want to map results with parseFloat.

SmujMaiku
  • 603
  • 6
  • 10
4

The function below are ports of the methods found in Python's set() class and follows the TC39 Set methods proposal.

const
  union = (a, b) => new Set([...a, ...b]),
  intersection = (a, b) => new Set([...a].filter(x => b.has(x))),
  difference = (a, b) => new Set([...a].filter(x => !b.has(x))),
  symetricDifference = (a, b) => union(difference(a, b), difference(b, a)),
  isSubsetOf = (a, b) => [...b].every(x => a.has(x)),
  isSupersetOf = (a, b) => [...a].every(x => b.has(x)),
  isDisjointFrom = (a, b) => !intersection(a, b).size;

const
  a = new Set([1, 2, 3, 4]),
  b = new Set([5, 4, 3, 2]);

console.log(...union(a, b));              // [1, 2, 3, 4, 5]
console.log(...intersection(a, b));       // [2, 3, 4]
console.log(...difference(a, b));         // [1]
console.log(...difference(b, a));         // [5]
console.log(...symetricDifference(a, b)); // [1, 5]

const
  c = new Set(['A', 'B', 'C', 'D', 'E']),
  d = new Set(['B', 'D']);
  
console.log(isSubsetOf(c, d));            // true
console.log(isSupersetOf(d, c));          // true

const
  e = new Set(['A', 'B', 'C']),
  f = new Set(['X', 'Y', 'Z']);
  
console.log(isDisjointFrom(e, f));        // true
.as-console-wrapper { top: 0; max-height: 100% !important; }
Mr. Polywhirl
  • 42,981
  • 12
  • 84
  • 132
  • TypeScript converted version: https://gist.github.com/Xevion/2390702004ccd756ad7e272debed9a3c – Xevion Feb 08 '23 at 06:35
1

This works, but I think another one is much more shorter, and elegant too

A = [1, 'a', 'b', 12];
B = ['a', 3, 4, 'b'];

diff_set = {
    ar : {},
    diff : Array(),
    remove_set : function(a) { ar = a; return this; },
    remove: function (el) {
        if(ar.indexOf(el)<0) this.diff.push(el);
    }
}

A.forEach(diff_set.remove_set(B).remove,diff_set);
C = diff_set.diff;
user187291
  • 53,363
  • 19
  • 95
  • 127
Xavi Ivars
  • 634
  • 8
  • 22
0

Using core-js to polyfill the New Set methods proposal:

import "core-js"

new Set(A).difference(B)

In theory, the time complexity should be Θ(n), where n is the number of elements in B.

Nick McCurdy
  • 17,658
  • 5
  • 50
  • 82
0

Answer provided by @koblas is good but returns items that are in both arrays aswell. With a slight modification (in ES6) for my use case where I want to get the difference, (with the intention of retrieving new items in array_j, as well as the the items in array_i that are not in array j as separate output arrays, these are the 3 main ways provided to do this:

var arr_i = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j"];
var arr_j = ["a", "c", "d", "f", "g", "h", "j", "k", "l", "n"];

The answers should be the new items in array j as ['b', 'e', 'i'] as well as the items in array i that are not in array j as ['k', 'l', 'n']

// Convert to Set
var set_i = new Set(arr_i);
var set_j = new Set(arr_j);

const changes = (arr1, arr2) => {
  // Using Array method
  let turn_on = arr2.filter((x) => !arr1.includes(x));
  let turn_off = arr1.filter((x) => !arr2.includes(x));
  return { turn_on, turn_off };
};

const setChanges = (set1, set2) => {
  // Using Set method
  let turn_on = new Set([...set2].filter((x) => !set1.has(x)));
  let turn_off = new Set([...set1].filter((x) => !set2.has(x)));
  return { turn_on, turn_off };
};

function* setMinus(setA, setB) {
  // Using Set method with generator by @koblas
  for (const v of setB.values()) {
    // .delete returns true if value was already in Set; otherwise false.
    if (!setA.delete(v)) {
      yield v;
    }
  }
}

const changesGenerator = (set1, set2) => {
  let turn_off = Array.from(setMinus(set2, set1));
  let turn_on = Array.from(setMinus(set1, set2));
  return { turn_on, turn_off };
};

All three methods return:

{ turn_on: [ 'k', 'l', 'n' ], turn_off: [ 'b', 'e', 'i' ] }

Timing these on random array including numbers from range [0,10000] containing 5000 items

let arr_i = Array.from({ length: 5000 }, () =>
  Math.floor(Math.random() * 10000)
);
let arr_j = Array.from({ length: 5000 }, () =>
  Math.floor(Math.random() * 10000)
);

var set_i = new Set(arr_i);
var set_j = new Set(arr_j);

console.time("Array method");
changes(arr_i, arr_j);
console.timeEnd("Array method");

console.time("Set method");
setChanges(set_i, set_j);
console.timeEnd("Set method");

console.time("Generator method");
changesGenerator(set_i, set_j);
console.timeEnd("Generator method");

Returns:

Array method: 36.894ms
Set method: 1.14ms
Generator method: 2.155ms

So yeah, just use:

let set1_minus_set2 = new Set([...set1].filter((x) => !set2.has(x)));