5

Given an array of arrays, what would be the efficient way of identifying the duplicate item?

var array = [
  [
    11.31866455078125,
    44.53836644772605
  ],
  [                     // <-- Here's the duplicate
    11.31866455078125,
    44.53836644772605
  ],
  [
    11.371536254882812,
    44.53836644772605
  ],
  [
    11.371536254882812,
    44.50140292110874
  ]
]

I've been working on this with lodash as an accepted dependency, and I get how to just return the "unique" list using _.uniqWith and _.isEqual:

_.uniqWith(array,_.isEqual)

With would give the "unique" version of the list:

[ 
    [ 11.31866455078125,  44.53836644772605 ],
    [ 11.371536254882812, 44.53836644772605 ],
    [ 11.371536254882812, 44.50140292110874 ]
]

But rather than just reporting the unique elements, I need just the element that is duplicated, and ideally the index of the first occurrence.

Is this actually covered in the lodash library by some combination of methods that I'm missing? Or am I just going to have to live with writing loops to compare elements.

Probably just overtired on this, so fresh eyes on the problem would be welcome.

Trying not to rewrite functions if there are library methods that suit, so I basically am stuck with:

  1. Returning just the duplicate or at least the comparison difference with the "unique list".

  2. Basically identifying the "index of" an array within an array. Though I suppose that can be a filter reduction with _.isEqual once the duplicate item is identified.

Trying also to avoid creating an object Hash/Map and counting the occurrences of keys here as well, or at least not as a separate object, and as something that can be done functionally "in-line".

Neil Lunn
  • 148,042
  • 36
  • 346
  • 317

5 Answers5

5

Lodash gives a lot of useful functions to achieve finding the first duplicate index.
Using the _.findIndex() and _.isEqual() the following code will find the first duplicate index:

var duplicateIndex = _.findIndex(array, function(value, index, collection) {
  var equal = _.isEqual.bind(undefined, value);
  return _.findIndex(collection.slice(0, index), equal) !== -1;
});

or a bit faster but more verbose:

var duplicateIndex = _.findIndex(array, function(value, index, collection) {
  var equal = _.isEqual.bind(undefined, value);
  return _.findIndex(collection, function(val, ind) {
     return ind < index && equal(val);
  }) !== -1;
});

Notice that if no duplicate exists, -1 will be returned.
In a few words the algorithm iterates through array and looks back if the current element does not exist already. If it does, just return the current iteration index.
Please check the working demo.

Dmitri Pavlutin
  • 18,122
  • 8
  • 37
  • 41
  • On further look I found my typo, and took a good look at the code and understand what you are doing here. Can't say I'm overly happy with using `.slice()` to keep growing a list, but it does feel cleaner than just indexed loops. Mulling it over. – Neil Lunn Apr 22 '16 at 08:03
  • @NeilLunn `_.findIndex(collection.slice(0, index), equal) !== -1;` can be reduced to a manual `findIndex` to iterate only once. But the current approach is to be compact. – Dmitri Pavlutin Apr 22 '16 at 08:16
  • Kind of where I was thinking. You got my vote anyway. I'm still just clearing my head and considering options. Like I said, it's a cleaner coded approach than others. – Neil Lunn Apr 22 '16 at 08:18
2

You can just use plain javascript to do that, it's not that hard, here is my implementation

for (let i = 0; i < array.length; i++) {
  for (let j = i + 1; j < array.length; j++) {
  
     // quick elimination by comparing sub-array lengths
     if (array[i].length !== array[j].length) {
        continue;
     }
     // look for dupes
     var dupe = true;
     for (var k = 0; k < array[i].length; k++) {
       if (array[i][k] !== array[j][k]) {
         dupe = false;
         break;
       }
     }
     // if a dupe then print
     if (dupe) {
         console.debug("%d is a dupe", j); 
     }
   }
 }

The nice part about this implementation is that it will print you multiple times that an array at an index is a dupe for multiple dupes, you can use that fact to count your dupes in each index!

This is actually a very efficient way to do this because the inner for loop (j) always runs from the next position of the outer loop (i). so you half your check count.

And here is a plunk

svarog
  • 9,477
  • 4
  • 61
  • 77
2

Here's an approach that uses uniqWith(), and difference():

_.indexOf(array, _.head(_.difference(array, _.uniqWith(array, _.isEqual))));

The basic idea is:

  1. Use uniqWith() to remove the duplicates from array.
  2. Use difference() to compare array against the duplicate-free version. This gets us an array of the duplicates.
  3. Use head() to get the first item of the array. This is the duplicate that we're interested in.
  4. Use indexOf() to find the index of the duplicate, in this case it's 1.

However, if you need the index of the original, and not it's duplicate, we have to make some adjustments:

var duplicate = _.head(_.difference(array, _.uniqWith(array, _.isEqual)));
_.findIndex(array, _.unary(_.partial(_.isEqual, duplicate)));

We're still using uniqWith(), and difference() to find the duplicate. But now, we're using findIndex() to get the index. The reason is that we need to use isEqual() to find the first position of the duplicate, not the second. We construct the predicate using partial() and unary(). The result this time is 0.

Adam Boduch
  • 11,023
  • 3
  • 30
  • 38
  • I swear this was exactly the first thing I tried, since it made logical sense. But I think my brain went to using `_.differenceWith()` and the same `_.isEqual` where just a plain `_.difference()` was all that was needed. Overthought it can then got turned away. Nice approach on index matching as well. – Neil Lunn Apr 22 '16 at 21:02
1

I don't know how to do this other than to just write the algorithm yourself. Both this answer and the other posted ones aren't very efficient but should be fine:

function findIndex(array, startingIndex, value) {
  var predicate = _.partial(_.isEqual, value);
  var arraySubset = array.slice(startingIndex+1);
  var index = arraySubset.findIndex(predicate);
  return index === -1 ? index : index+startingIndex+1;
}

function findDuplicates(array) {
  return array.map((value, index) => {
    return {
      value,
      index: findIndex(array, index, value)
    };
  }).filter(info => info.index !== -1);
}

findDuplicates([1, 2, 3, 4, 1, [ 3 ], [ 4 ], [ 3 ] ]);

// [ { value: 1, index: 4 }, { value: [ 3 ], index: 7 } ]    // [ { value: 1, index: 4 }, { value: [ 3 ], index: 7 } ]

This basically creates a map of the array, calling .findIndex() on the remainder of the array, noting down the index of any duplicates, returning information about every item that has a duplicate and what the index of the duplicate is.

One nice thing about this is that it will work for triplicates or any amount of occurrences of a value.

Alan
  • 408
  • 3
  • 12
1

I believe constructing a LUT is one of the most efficient ways when it comes to making comparisons. The following method constructs a LUT by utilizing Array.prototype.reduce() and eventually mutates the original array by removing not only one but all duplicate elements regardless of how many there are.

var arr = [
  [
    11.31866455078125,
    44.53836644772605
  ],
  [
    11.31866455078125,
    44.53836644772605
  ],
  [
    11.371536254882812,
    44.53836644772605
  ],
  [
    11.371536254882812,
    44.50140292110874
  ]
];
arr.reduce((p,c,i)=> { var prop = c[0]+"" + c[1]+"";
                       p[prop] === void 0 ? p[prop] = i : p.dups.push(i);
                       return p;
                     },{dups:[]}).dups.reverse().forEach( i => arr.splice(i,1))

document.write('<pre>' + JSON.stringify(arr, 0, 2) + '</pre>');

However if you would like to have a new array by keeping the original then obviously it would be much faster procedure.

Redu
  • 25,060
  • 6
  • 56
  • 76