For example, I have an array with integers from 1 to 1,000,000. One of those numbers is duplicated inside the array. I would imagine that sorting the array numerically, and then looping through to check for when a value repeats would be the most efficient way of doing this, but it seems long winded and I'm not 100% sure. Anyone?
-
5Possible duplicate of [Get all non-unique values (i.e.: duplicate/more than one occurrence) in an array](https://stackoverflow.com/questions/840781/get-all-non-unique-values-i-e-duplicate-more-than-one-occurrence-in-an-array) – imjared Jun 20 '18 at 17:27
-
Yes, you are right, there is no better way in general. However if you need to add new values to array and check for duplicates, you can create tree-structure, that identifies it in log n, but it also increases insert to log n – libik Jun 20 '18 at 17:27
-
@lealceldeiro - go ahead and explain how to do that. Beside nobel price, you will get a lot of upvotes – libik Jun 20 '18 at 17:35
-
Do you want the dups returned to you or simply removed? – zfrisch Jun 20 '18 at 17:35
-
@libik, I take it back, missed something very important. If I implement what I just said that would be definitely a nobel price ;) Thanks for the catch! – lealceldeiro Jun 20 '18 at 17:43
-
If you are looking for duplicates...yet you know there is "One" duplicate in there....why do you not do something with the duplicate to begin with? Point being...you did something to know there is already one of something there...let's say you try to add 5 but it says 5 is already entered...why not handle it here? Otherwise this has to be an excise question or homework to find one? – Matt Jun 20 '18 at 18:08
2 Answers
There is in fact a better way. You can loop over an array once and find a duplicate by using a seen values cache, which if you're using basic numbers is easy enough since they are valid Object keys
[1,2,3,3,4,5].filter(
function(value){
if(this[value]) return true
this[value] = true
},
{}
)
If you wanted to know what the index it occurred you'll need to treat the cache a little differently
var cache = {}
[1,2,2,3,4,5,5].filter(
function(value, index){
if(this[value]){
this[value].push(index)
return true
}
this[value] = [index]
},
cache
)
Object.entries(cache)
Of course using a basic object might not always work so you may need to find a different data type to use as the cache, but it means you only have to loop over the array once.
This was an answer to a misinterpretation of how to make sure an array only contain unique references. When I find a proper place to put this I will move it then
There's not a better way, but in Javascript and for simple data types there is a more compact way thanks to Set
and Array.from()
Array.from(new Set([1,1,2,2,3,3,4,4,5])) // [1,2,3,4,5]
This does not work for arrays of similar objects though as that requires inspection. But for simple things it works wonders. Of course with simple objects you COULD JSON.stringify()
them all before sending it through Set
.
This of course works because Set
defines keys by the data itself. As such same values(pointer references or basic data types) are assigned the same place in the Set
and so when the Array.from()
iterates over the set it is now only iterating over the unique values. This might not be fast enough for large datasets(millions of records), but it's maintainable for smaller datasets and can easily simplify how it looks if nothing else

- 1,954
- 11
- 24
-
1
-
I missed that and will be reediting my answer after addressing other answers – Robert Mennell Jun 20 '18 at 17:49
Suppose you always have to loop through the array one way or another. Something like (not sure how it'll work out for large arrays)
[edit 2018/10/26] Nowadays using Set is a way to get unique Array entries. It looks pretty efficient. See adjusted snippet (note: creating an Array of ten million elements takes a while, be patient running it).
let arrayUniquify = (arr = []) => [...new Set(arr)];
const largeArray = Array.from(new Array(10000000), () => Math.floor(Math.random()*10));
console.log(`largeArray.length: ${largeArray.length} (ten million elements)`);
let start = performance.now();
const uniquified = arrayUniquify(largeArray);
console.log(`uniquify lasted ${(performance.now() - start).toFixed(2)} ms`);
console.log(`largeArray.length now: ${uniquified.length}`);
console.log(`uniquified array: [${uniquified}]`);
let objTmp = {};
// remove duplicates
const uniqueValues = ["foo",2,4,5,5,7,8,9,0,"foo",8,9,5]
.filter( v => v in objTmp ? false : (objTmp[v] = true) && true, objTmp)
console.log(uniqueValues);
// display duplicate values
objTmp = {};
const doubleValues = ["foo",2,4,5,5,7,8,9,0,"foo",8,9,5]
.filter( v => v in objTmp ? true : (objTmp[v] = true) && false, objTmp);
console.log(doubleValues);

- 119,216
- 31
- 141
- 177