DSA question: remove duplicates from array of array containing same associative values

Question

Lets Assume we have an array of arrays , containing some numbers:

let arr = [ [1,2,3] ,[5,6,9] ,[2,1,3],[3,2,4],[4,3,5], [6,9,5] ]

here arr[0] and arr[2] have the same associated values (1,2,3) though the position is not the same.

same arr[1] and arr[5] have the same associated values (5,6,9) though the position is not the same

I want to remove duplicates , and want results :

final= [ [1,2,3] ,[5,6,9] ,[3,2,4],[4,3,5] ]

any solution in javascript or python will be great, other languages like c++ are also welcomed( i have very little knowledge of that), Pseudocode is also helpful.

I want to know what will be the best BIG-O algo for this.

This question is constrained to only 2 dimensions `(a,b,c)` -> `(b-a,c-a)`? — Neil, Apr 07 '21 at 21:40
Can you add some constraints to the problem? Can I take it to be n x m i.e., n single arrays and each of those n arrays contains upto m elements each. If n, m <=10^2, it can be done in O(n * m * m) i.e. you sort each single array and then see if any arrays is similar to the already included array. If you want better time complexity, you can do it using hashing or even bitset that can get you O(n * m * logm) time complexity — risingStark, Apr 07 '21 at 21:44
_Ie_ a hash set that defined `equals` as `return (x_b-x_a == y_b-y_a) && (x_c-x_a == y_c-y_a)` would probably do just fine depending on how guaranteed you want it. — Neil, Apr 07 '21 at 21:55

tucuxi · Accepted Answer · 2021-04-07T22:21:27.490

Given a 2-level array of length N * M (N elements, each of which is a sub-array with M numbers), you will have to look at least once at all N * M numbers to detect duplicates. So, a hard lower bound is O(N*M). A very simple approach would be to compare each sub-array to all others - at a cost of O(N * (N-1) * M * M), with 4 nested loops. This can certainly be improved.

One possibility is to sort the array and its sub-arrays (for instance, using lexicographical ordering of sub-array numbers). This would require O((N log N) * (M log M)) time, but would allow finding if a sub-array is identical to another sub-array in O(M) - since for each number, you would only have to check if the corresponding number is equal or not. So, for this approach, you get a cost of O((N log N) * (M log M)) (which is larger than the O(N * M) required to actually weed out the duplicates once everything is sorted, and therefore dominates cost-wise). This is also very close to optimal.

let arr = [ [1,2,3] ,[5,6,9] ,[2,1,3],[3,2,4],[4,3,5], [6,9,5] ]

function dedupBySorting(a) {
   // sort sub-arrays
   a = a.map(a => a.sort())
   // sort main array
   a = a.sort();
   // define equality of two sub-arrays
   const same = (x, y) => {
       if (x.length != y.length) return false;
       for (let i=0; i<x.length; i++) {
           if (x[i] != y[i]) return false;
       }
       return true;
   }
   // take the 1st of each run 
   return a.filter((item, index) => {
     return index==0 || !same(item, a[index-1]);
   });
}

console.log(dedupBySorting(arr));

Another possibility is to use hashing: put everything into a hash-set (which will not accept duplicates), and output the remaining sub-arrays. Your hash function will have to be at least O(M) (otherwise it will not be very good), and it must yield the same hash regardless of the order of numbers in sub-arrays. Assuming few collisions, you will be able to build the hash-set in O(N * M), which would be optimal. In my case, I have sorted sub-arrays, incurring in O(N * M * log M) - other options may be faster, but yield more collisions.

let arr = [ [1,2,3] ,[5,6,9] ,[2,1,3],[3,2,4],[4,3,5], [6,9,5] ]

function dedupByHashing(a) {
    // build a hashable key such that k([1,2,3]) == k([2,1,3]); that is, it ignores order
    const key = (v) => JSON.stringify(v.sort());
    // build the set, and use it to filter out duplicates
    let seen = new Set();
    return a.filter(item => {
        let k = key(item);
        return seen.has(k) ? false : seen.add(k);
    });
}

console.log(dedupByHashing(arr));

Note that this code does not yield your exact output, because in your output you seem to include non-duplicate inputs in the exact same order as you find them in the input, and you output only the 1st of each run, with the elements also in input-order. But both answers can be modified to output just that with minor changes, which would not impact their complexity.

Note: you may also be interested in looking up this excellent writeup regarding deduplication in javascript. It helped me write cleaner implementations for both answers.

great really liked the answer. Thanks a lot – 1101 tech Apr 08 '21 at 05:31 — 1101 tech, Apr 08 '21 at 05:31

DSA question: remove duplicates from array of array containing same associative values

1 Answers1