21

I am trying to figure out an efficient way to remove objects that are duplicates from an array and looking for the most efficient answer. I looked around the internet everything seems to be using primitive data... or not scalable for large arrays. This is my current implementation which is can be improved and want to try to avoid labels.

 Test.prototype.unique = function (arr, artist, title, cb) {
        console.log(arr.length);
        var n, y, x, i, r;
        r = [];      
        o: for (i = 0, n = arr.length; i < n; i++) {

          for (x = 0, y = r.length; x < y; x++) {

                if (r[x].artist == arr[i].artist && r[x].title == arr[i].title) {
                    continue o;
                }
            }
            r.push(arr[i]);
        }

        cb(r);
    };

and the array looks something like this:

[{title: sky, artist: jon}, {title: rain, artist: Paul}, ....]

Order does not matter, but if sorting makes it more efficient then I am up for the challenge...

and for people who do not know o is a label and it is just saying jump back to the loop instead of pushing to the new array.

Pure javascript please no libs.

ANSWERS SO FAR:

The Performance Test for the answers below: http://jsperf.com/remove-duplicates-for-loops

Lion789
  • 4,402
  • 12
  • 58
  • 96
  • Are your _Objects_ safe for _JSON_? It may be fastest to `stringify` them and compare that. **edit** which may not be best for you as may only work if the properties are defined in the same order. – Paul S. Oct 21 '13 at 17:57
  • Perhaps this question: http://stackoverflow.com/questions/3629817/getting-a-union-of-two-arrays-in-javascript – todd.pund Oct 21 '13 at 17:59
  • What do you mean by "failing to work when trying to process over 1000 results"? What happens? – mayabelle Oct 21 '13 at 18:00
  • Use jQuery! jQuery.unique( array )....... lol :) Seriously though, if you want, reference the source, and see how they handle it. – Casey ScriptFu Pharr Oct 21 '13 at 18:02
  • Nested loops aren't best for uniqueness checking. Use objects whose property names are the keys. – Barmar Oct 21 '13 at 18:04
  • @Casey that won't work with non-primitives where you're expecting different references – Paul S. Oct 21 '13 at 18:06
  • Paul, the data looks just like that, but with thousands of objects some duplicate, the properties are all in order, but i rather a more generic solution – Lion789 Oct 21 '13 at 18:15
  • mayabelle it skips some of the results and always ends up showing 200 results as oppose to 2000 for example... – Lion789 Oct 21 '13 at 18:16
  • @Paul. Ahh.. thanks, well learning something everyday. Hopefully, because if not, that means I probably died... lol – Casey ScriptFu Pharr Oct 21 '13 at 19:00

9 Answers9

30

I see, the problem there is that the complexity is squared. There is one trick to do it, it's simply by using "Associative arrays".

You can get the array, loop over it, and add the value of the array as a key to the associative array. Since it doesn't allow duplicated keys, you will automatically get rid of the duplicates.

Since you are looking for title and artist when comparing, you can actually try to use something like:

var arrResult = {};
for (i = 0, n = arr.length; i < n; i++) {
    var item = arr[i];
    arrResult[ item.title + " - " + item.artist ] = item;
}

Then you just loop the arrResult again, and recreate the array.

var i = 0;
var nonDuplicatedArray = [];    
for(var item in arrResult) {
    nonDuplicatedArray[i++] = arrResult[item];
}

Updated to include Paul's comment. Thanks!

Henrique Feijo
  • 674
  • 5
  • 10
  • 1
    `arrResult` here is a normal _Object_. You will also need a delimiter to protect `foo, bar` against `foob, ar`. +1 because this should work well for OP's case – Paul S. Oct 21 '13 at 18:08
  • Don't forget to declare 'arrResult' before the loop and use arr[i] instead of arr inside. – Mike Edwards Oct 21 '13 at 18:11
  • it should be returning the array Result (as you can tell each of those are unique) but it only returns one of the objects back... – Lion789 Oct 21 '13 at 18:53
  • 2
    @Lion789 It's a problem in your code - you're setting `arrResult` using `title` and `artist`, but your sample array has key1 and key2. http://jsfiddle.net/yKwZe/1/ – Scott Mermelstein Oct 21 '13 at 19:03
  • http://jsfiddle.net/yKwZe/3/ ok still look at the results, i changed it to artist and title – Lion789 Oct 21 '13 at 19:06
  • @Lion789 Yes, but I changed the `arrResult` area - so it now says `arrResult[item.key1 + " - " + item.key2] = item`. The version I put up works fine. I won't change this one, though, since we're obviously having dueling edits. – Scott Mermelstein Oct 21 '13 at 19:08
  • is there a way I can test both of these with an array of over a 1000 objects, I am not sure how to create a test like that, without manually creating it... Because when implementing both yours and his with my real results, yours returns duplicates and cuts the 2441 objects to 1277, but his cuts it down to 206 but I really don't think there is that many duplicates although it is possible... and if I implement my original method it returns the same amount 206 so... I am assuming it was working correctly then? – Lion789 Oct 21 '13 at 19:18
  • @Lion789 debugging will take some manual work. That's what they pay us the big bucks for. :-) Try sorting the 206 results, and compare them to the already-sorted results from my algorithm. When you find the first difference, figure out which answer is right. – Scott Mermelstein Oct 21 '13 at 19:26
  • well your answer has duplicates a 100%, it is just confusing to me that out of 2441 objects only 206 are unique... it is possible but... that means sirius xm radio played only 206 unique songs for one week... also check this out http://jsperf.com/remove-duplicates-for-loops – Lion789 Oct 21 '13 at 19:28
  • @Lion789 I saw on your jsfiddle that you are assigning values to both arr and arrResult. This algorithm as is only removes duplicates from a single array. So you need to assign the values to arr, and arrResult is only a supporting array that will store the values inside the function. Do you want to remove duplicates between 2 arrays? – Henrique Feijo Oct 22 '13 at 13:52
  • Hey, no, was just using it to compare what the results should include object wise, is the method you included though, more efficient than the way I did it... I cannot really tell from the jsPerf - when it is ran. – Lion789 Oct 22 '13 at 15:08
  • Oh I see.. You need to take in consideration, that if the records are in low number, you won't see much difference. It's when you have a really high number that you start seeing the difference. If you have 1000 records, it's 1000 iterations against 1000000. But if you have only about 100, it's 100 against 10000, which in a very fast computer can be almost the same. – Henrique Feijo Oct 22 '13 at 18:39
  • 1
    Please add the line var nonDuplicatedArray = []; for dummies like me! Thanks! This works great with that caveat. – Sevak Avakians Sep 11 '15 at 21:02
3

Here is a solution that works for me.

Helper functions:

// sorts an array of objects according to one field
// call like this: sortObjArray(myArray, "name" );
// it will modify the input array
sortObjArray = function(arr, field) {
    arr.sort(
        function compare(a,b) {
            if (a[field] < b[field])
                return -1;
            if (a[field] > b[field])
                return 1;
            return 0;
        }
    );
}

// call like this: uniqueDishes = removeDuplicatesFromObjArray(dishes, "dishName");
// it will NOT modify the input array
// input array MUST be sorted by the same field (asc or desc doesn't matter)
removeDuplicatesFromObjArray = function(arr, field) {
    var u = [];
    arr.reduce(function (a, b) {
        if (a[field] !== b[field]) u.push(b);
        return b;
    }, []);
    return u;
}

and then simply call:

        sortObjArray(dishes, "name");
        dishes = removeDuplicatesFromObjArray(dishes, "name");
Nico
  • 4,248
  • 1
  • 20
  • 19
2

Basic sort-then-unique implementation, fiddle HERE:

function unique(arr) {
    var comparer = function compareObject(a, b) {
        if (a.title == b.title) {
            if (a.artist < b.artist) {
                return -1;
            } else if (a.artist > b.artist) {
                return 1;
            } else {
                return 0;
            }
        } else {
            if (a.title < b.title) {
                return -1;
            } else {
                return 1;
            }
        }
    }

    arr.sort(comparer);
    console.log("Sorted: " + JSON.stringify(arr));
    for (var i = 0; i < arr.length - 1; ++i) {
        if (comparer(arr[i], arr[i+1]) === 0) {
            arr.splice(i, 1);
            console.log("Splicing: " + JSON.stringify(arr));
        }
    }
    return arr;
}

It may or may not be the most efficient, and should be entirely scalable. I've added some console.logs so you can see it as it works.

EDIT

In the interest of saving on the space the function used, I did that for loop at the end, but it seems likely that didn't properly find only unique results (depsite it passing my simple jsfiddle test). Please try replacing my for loop with the following:

var checker;
var uniqueResults = [];
for (var i = 0; i < arr.length; ++i) {
    if (!checker || comparer(checker, arr[i]) != 0) {
        checker = arr[i];
        uniqueResults.push(checker);
    }
}
return uniqueResults;
Scott Mermelstein
  • 15,174
  • 4
  • 48
  • 76
  • You can check http://stackoverflow.com/questions/234683/javascript-array-sort-implementation/236534#236534 for information on typical complexities for `sort`. This obviously does one additional linear pass to make it unique, and doesn't blatantly take up any extra space. – Scott Mermelstein Oct 21 '13 at 18:32
  • This seems to be working, but it is actually off by one ... http://jsfiddle.net/9GsCw/1/ – Lion789 Oct 21 '13 at 18:47
  • 1
    @Lion789 I agree. I actually upvoted Henrique's answer, which is O(n), but figured it wouldn't hurt to leave mine in. It may be helpful to someone else some other day. – Scott Mermelstein Oct 21 '13 at 18:49
  • @Lion789 It's a problem in your code - you're setting `arrResult` using `title` and `artist`, but your sample array has `key1` and `key2`. – Scott Mermelstein Oct 21 '13 at 19:02
  • ah kk, well it is not off by one anymore updated it here http://jsfiddle.net/yKwZe/4/ – Lion789 Oct 21 '13 at 19:08
  • @Lion789 Please check my change. It will probably give you a number of results that matches your other two algorithms. – Scott Mermelstein Oct 21 '13 at 19:41
  • Can you make a new jsfiddle with the whole thing because I am not sure if it works correctly... this is how I implemented it http://jsfiddle.net/9TcQF/ but nothing is printing to the console. – Lion789 Oct 21 '13 at 20:57
  • 1
    @Lion789 http://jsfiddle.net/9TcQF/1/ You weren't sorting the array, nor did you call the `unique` function. Fixed both, and we're back to getting 4 results. – Scott Mermelstein Oct 21 '13 at 21:05
  • I added your loop to the jsperf I have up top, I think your method is the fastest – Lion789 Oct 21 '13 at 21:57
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/39688/discussion-between-scott-mermelstein-and-lion789) – Scott Mermelstein Oct 21 '13 at 22:03
1

I use this function. its not doing any sorting, but produces result. Cant say about performance as never measure it.

var unique = function(a){
    var seen = [], result = [];
    for(var len = a.length, i = len-1; i >= 0; i--){
        if(!seen[a[i]]){
            seen[a[i]] = true;
            result.push(a[i]);
        }
    }
    return result;
}

var ar = [1,2,3,1,1,1,1,1,"", "","","", "a", "b"]; console.log(unique(ar));// this will produce [1,2,3,"", "a", "b"] all unique elements.

1

Below is Henrique Feijo's answer with ample explanation and an example that you can cut and paste:

Goal: Convert an array of objects that contains duplicate objects (like this one)...

[
    {
        "id": 10620,
        "name": "Things to Print"
    },
    {
        "id": 10620,
        "name": "Things to Print"
    },
    {
        "id": 4334,
        "name": "Interesting"
    }
]

... Into an array of objects without duplicate objects (like this one):

[
    {
        "id": 10620,
        "name": "Things to Print"
    },
    {
        "id": 4334,
        "name": "Interesting"
    }
]

Explanation provided in the comments:

    var allContent = [{
      "id": 10620,
      "name": "Things to Print"
    }, {
      "id": 10620,
      "name": "Things to Print"
    }, {
      "id": 4334,
      "name": "Interesting"
    }]

     //Put Objects Into As Associative Array. Each key consists of a composite value generated by each set of values from the objects in allContent.
    var noDupeObj = {} //Create an associative array. It will not accept duplicate keys.
    for (i = 0, n = allContent.length; i < n; i++) {
      var item = allContent[i]; //Store each object as a variable. This helps with clarity in the next line.
      noDupeObj[item.id + "|" + item.name] = item; //This is the critical step.
      //Here, you create an object within the associative array that has a key composed of the two values from the original object. 
      // Use a delimiter to not have foo+bar handled like fo+obar
      //Since the associative array will not allow duplicate keys, and the keys are determined by the content, then all duplicate content are removed. 
      //The value assigned to each key is the original object which is along for the ride and used to reconstruct the list in the next step.
    }

     //Recontructs the list with only the unique objects left in the doDupeObj associative array
    var i = 0;
    var nonDuplicatedArray = [];
    for (var item in noDupeObj) {
      nonDuplicatedArray[i++] = noDupeObj[item]; //Populate the array with the values from the noDupeObj.
    }

    console.log(nonDuplicatedArray)
mplungjan
  • 169,008
  • 28
  • 173
  • 236
user3303554
  • 2,215
  • 2
  • 16
  • 16
1

For those who love ES6 and short stuff, here it's one solution:

const arr = [
  { title: "sky", artist: "Jon" },
  { title: "rain", artist: "Paul" },
  { title: "sky", artist: "Jon" }
];

Array.from(arr.reduce((a, o) => a.set(o.title, o), new Map()).values());

const arr = [
  { title: "sky", artist: "Jon" },
  { title: "rain", artist: "Paul" },
  { title: "sky", artist: "Jon" },
  { title: "rain", artist: "Jon" },
  { title: "cry", artist: "Jon" }
];

const unique = Array.from(arr.reduce((a, o) => a.set(o.title, o), new Map()).values());

console.log(`New array length: ${unique.length}`)

console.log(unique)

The above example only works for a unique title or id. Basically, it creates a new map for songs with duplicate titles.

Abraham
  • 8,525
  • 5
  • 47
  • 53
0

Below code compares object with JSON as String format and removes duplicates and works fine with simple arrays.

    Array.prototype.unique=function(a){
     return function(){
        return this.filter(a)
     }
   }(
   function(a,b,c){
     var tmp=[]; 
     c.forEach(function(el){
        tmp.push(JSON.stringify(el))
    }); 
    return tmp.indexOf(JSON.stringify(a),b+1)<0
  })
Jens
  • 67,715
  • 15
  • 98
  • 113
  • I understand why no one really tried using this. Or at least give some feedback – Jay Oct 08 '14 at 13:54
0

If you are using underscore js, it is easy to remove duplicate object. http://underscorejs.org/#uniq

Mohammed Safeer
  • 20,751
  • 8
  • 75
  • 78
0
function remove_duplicates(objectsArray) {
    var arr = [], collection = []; 
    $.each(objectsArray, function (index, value) {
        if ($.inArray(value.id, arr) == -1) { 
            arr.push(value.id);
            collection.push(value);
        }
    });
    return collection;
}
Rohan Kumar
  • 40,431
  • 11
  • 76
  • 106