I have a very large array of arrays (on the order of 960,799 entries or possibly much larger). I need to process it into a new array such that:
- Each sub-array contains no duplicates.
- The main array contains no duplicate sub-arrays.
The problem is that "duplicate sub-arrays" must include arrays with the same values in a different order. In other words, if I had these sub-arrays:
[[1,2,3], [1,2,3], [3,1,2]]
They would all be considered duplicates and only one would be kept (any of them, it doesn't matter; I've been just keeping the first one; it's also fine if the order of the selected sub-array doesn't actually match, i.e. if the order of elements in the sub-array changes during processing).
My attempted solution has been to map all the sub-arrays into strings based on de-duping the sub-array, sorting it, and joining it with a delimiter. Then I de-dupe that final array, then map them back to arrays with a split. It works, but the process is extremely slow. It takes over 30 seconds for a single pass, and since the array I end up processing can grow exponentially larger, this is not acceptable. I need a more efficient algorithm.
Here's the code I'm using now that's slow (ret
is the input array):
const stringList = ret.map(list => {
return [...new Set(list)].sort().join('|');
});
const hashSet = new Set(stringList);
const output = [...hashSet].map(str => str.split('|'));
Can anyone help me get the same result more efficiently? Thanks.
EDIT
To elaborate, I'm getting these massive input arrays by calculating what is essentially the power set of some input of strings. This is the code; if it's possible to stop it from producing duplicate entries in the first place, that would work well, too, I think:
// Calculate the Cartesian product of set s
function cart(s) {
return s.reduce((acc, val) => {
return acc.map((x, i) => {
return val.map(y => {
return x.concat([y]);
});
}).flat();
}, [[]]);
}
// Use the Cartesian product to calculate the power set of set s
function pset(s) {
let ret = [];
for (let i = 0; i < s.length; ++i) {
const temp = [];
for (let j = 0; j <= i; ++j) {
temp.push([].concat(s));
}
ret = ret.concat(cart(temp));
}
return ret;
}