0

If i have a multidimensional array like: [[a,b],[a,c],[b,a],[b,c],[c,a],[c,b]] how can i go through and remove repeats where [a,b] is the same as [b,a].

also, the array is actually massive, in the tens of thousands. A for loop would have to be done backwards because the array length will shrink on every iteration. Im not even sure that an each loop would work for this. I really am at a loss for just a concept on how to begin.

Also, i tried searching for this for about an hour, and i don't even know how to phrase it.

  • possible duplicate of [Remove Duplicates from JavaScript Array](http://stackoverflow.com/questions/9229645/remove-duplicates-from-javascript-array) - the second answer here under the "Unique By" heading should help – Rhumborl Jan 01 '15 at 21:06
  • @Rhumborl no, this isn't a duplicate of that question. He doesn't want to remove entries that appear more than once, he just wants to filter out some entries based on the described criteria. – Pointy Jan 01 '15 at 21:08
  • 2
    It's not at all clear what the significance is of "a", "b", etc. repeating in your example. All I can tell for sure is that you don't want tuples where the first element is the same as the second element. – Pointy Jan 01 '15 at 21:15
  • You can try the method mentioned in that other question, but may be very, very slow. What are the values in the tuples? Does the order of the list need to be preserved? – Pointy Jan 01 '15 at 21:22
  • speed is not really a concern, but it does have to do it eventually. I am actually working with a very large set of arrays, in the 10s of thousands. The values are actually objects. I guess i oversimplified with using only `a,b,c` because i thought it was a simple thing i was missing. Order does not have to be preserved at all. Also, i thought tuples didn't really exist in javascript? – Matthew Martini Jan 01 '15 at 21:26
  • I'm using the word "tuple" loosely. If there are really many tens of thousands of records, then a non-fancy approach will almost certainly cause the browser to ask the user if the script should be stopped (unless you're using web workers). – Pointy Jan 01 '15 at 21:30
  • Also, what sorts of values are in the sub-arrays? Are they numbers? Strings? Something else? – Pointy Jan 01 '15 at 21:35
  • And is it OK for `[a, b]` to be in the array more than once, or do you mean that if `[a, b]` is in the array once, then neither `[a,b]` nor `[b,a]` should be in the array subsequently? – Pointy Jan 01 '15 at 21:36
  • @MatthewMartini Pointy has asked a lot of worthwhile questions, and you have answered very few of them. It looks like you're now asking other questions because your attempts to solve this one are consuming all of your browser's memory. Please answer people's questions when they are trying to help you. What are the objects `a`, `b`, and `c`? What do they contain? There is very likely an efficient answer to the problem you're trying to solve, but it will be hard to give you one if you're not willing to give us sufficient information about it. – JLRishe Jan 03 '15 at 14:45

3 Answers3

1

Based on my understanding that you want to remove from the parent array any children arrays which hold the same set of objects without regard for order, this should do it is some code:

function getId(obj) { // apparently these objects have identifiers
  return obj._id; // I'm testing with MongoDB documents
}
function arraysEqual(a, b) {
  if (a === b) { return true; }
  if (a == null || b == null) { return false; }
  if (a.length != b.length) { return false; }
  aIds = [];  bIds = [];
  for (var i = 0; i < a.length; i++) {
    aIds.push(getId(a[i])); bIds.push(getId(b[i]));
  }
  aIds.sort(); bIds.sort();
  for ( var i = 0; i < aIds.length; i++ ) {
    if(aIds[i] !== bIds[i]) { return false; }
  }
  return true;
}
function removeRepeats(list) {
  var i, j;
  for (i=0; i < list.length; i++) {
    for (j=i+1; j < list.length; j++) {
      if (arraysEqual(list[i], list[j])) {
        list.splice(j,1);
      }
    }
  }
}

The removeRepeats function goes through each element and compares it with every element that comes after it. The arraysEqual function simply returns true if the arrays are equal. The isEquivalent function should test object equivalence. As noted on that webpage, there are libraries that test object equivalence. If you are okay with adding those libraries, you can replace the isEquivalent function with _.isEqual.

Community
  • 1
  • 1
NoOutlet
  • 1,949
  • 1
  • 14
  • 22
  • not an exact answer, but i don't know if it is wrong yet, im playing with it to see if i can make this work. As it stands `arraysEqual` always returns false, and `removeRepeats` always returns undefined. Is there anything that is particularly unclear in the question? – Matthew Martini Jan 01 '15 at 21:37
  • The `arraysEqual()` function may be returning `false` all the time because no two objects are ever equal to each other. If, for example, the array started off as JSON, then every value will be distinct and will not be `===` any other value, regardless of what the objects look like. – Pointy Jan 01 '15 at 21:40
  • (The OP clarified in a comment on the question that the values involved are actually objects.) – Pointy Jan 01 '15 at 21:41
  • @NoOutlet right, this is probably as close as the OP is going to get without providing more detail about the nature of his data. – Pointy Jan 01 '15 at 21:53
  • Not just yet...ill try when i get back to computer along with another promising answer from @acontrell below. I think i see how it works...looks promising – Matthew Martini Jan 02 '15 at 00:32
1

I think I'm going to try a different approach to this problem. I also think it'll be quicker than some of the solutions proposed (though we'd need of course to test it and benchmark it).

First off, why don't we take advantage of the hash oriented nature of javascript arrays and objects? We could create an object containing the relations (in order to create a kind of a map) and store in a new array those relationships that hasn't been stored yet. With this approach there's no problem about objects either, we just request for an identifier or hash or whatever for every object. This identifier must make the relationship between them possible.

UPDATE

  • The script now controls the possibility of repeated elements f.e [[a,b],[a,b]]
  • The script now controls the possibility of elements with the same object repeated f.e [[a,a],[a,a][a,a]] would return [a,a]

The code:

var temp = {},
    massive_arr = [['a','b'],['a','c'],['a','d'], ['b','a'],['b','c'],['b','d'],['c','a'],['c','b'],['c','d']],
    final_arr = [],
    i = 0,
    id1,
    id2;
for( ; i < massive_arr.length; i++ ) {
    id0 = objectIdentifier(massive_arr[i][0]);// Identifier of first object
    id1 = objectIdentifier(massive_arr[i][1]);// Identifier of second object

    if(!temp[id0]) {// If the attribute doesn't exist in the temporary object, we create it.
        temp[id0] = {};
        temp[id0][id1] = 1;
    } else {// if it exists, we add the new key.
        temp[id0][id1] = 1;
    }

    if( id0 === id1 && !temp[id0][id1+"_bis"] ) {// Especial case [a,a]
        temp[id0][id1+"_bis"] = 1;
        final_arr.push(massive_arr[i]);
        continue;// Jump to next iteration
    }

    if (!temp[id1]) {// Store element and mark it as stored.
      temp[id1] = {};
      temp[id1][id0] = 1;
      final_arr.push(massive_arr[i]);
      continue;// Jump to next iteration
    }

    if (!temp[id1][id0]) {// Store element and mark it as stored.
      temp[id1][id0] = 1;
      final_arr.push(massive_arr[i]);
    }
}
console.log(final_arr);

function objectIdentifier(obj) {
    return obj;// You must return a valid identifier for the object. For instance, obj.id or obj.hashMap... whatever that identifies it unequivocally.
}

You can test it here

SECOND UPDATE

Though this is not what was requested in the first place, I've changed the method a bit to adapt it to elements of n length (n can vary if desired).

This method is slower due to the fact that relies on sort to generate a valid key for the map. Even so, I think it's fast enough.

var temp = {},
massive_arr = [
    ['a', 'a', 'a'], //0
    ['a', 'a', 'b'], //1
    ['a', 'b', 'a'],
    ['a', 'a', 'b'],
    ['a', 'c', 'b'], //2
    ['a', 'c', 'd'], //3
    ['b', 'b', 'c'], //4
    ['b', 'b', 'b'], //5
    ['b', 'b', 'b'],
    ['b', 'c', 'b'],
    ['b', 'c', 'd'], //6
    ['b', 'd', 'a'], //7
    ['c', 'd', 'b'],
    ['c', 'a', 'c'], //8
    ['c', 'c', 'a'],
    ['c', 'd', 'a', 'j'], // 9
    ['c', 'd', 'a', 'j', 'k'], // 10
    ['c', 'd', 'a', 'o'], //11
    ['c', 'd', 'a']
],
    final_arr = [],
    i = 0,
    j,
    ord,
    key;
for (; i < massive_arr.length; i++) {
    ord = [];
    for (j = 0; j < massive_arr[i].length; j++) {
        ord.push(objectIdentifier(massive_arr[i][j]));
    }

    ord.sort();
    key = ord.toString();

    if (!temp[key]) {
        temp[key] = 1;
        final_arr.push(massive_arr[i]);
    }
}

console.log(final_arr);

function objectIdentifier(obj) {
    return obj;
}

It can be tested here

acontell
  • 6,792
  • 1
  • 19
  • 32
  • I havent tried it with the objects yet but it looks like it should work. Will accept as answer as soon as i try implementing it. – Matthew Martini Jan 01 '15 at 23:21
  • @MatthewMartini Thanks, please let me know if it works, I'm curious about it. BTW, now it checks if there are repeated elements and only store them once. For instance, [[a,b],[a,b]] would store [a,b] only once. Hope it helps. – acontell Jan 02 '15 at 00:04
  • Works perfectly, didn't benchmark, but works very quickly with sets under 1000 so far. The problem is, if i add a third element to each subarray. just adding `temp[id2]` is clearly not a solution. i'll keep trying to see if there is a way to use subarrays of `n` length as opposed to just 2. – Matthew Martini Jan 02 '15 at 02:23
  • @MatthewMartini now it works for elements of n length (n length can be different from one element to another). It's slower though than the previous one. Check it out if you like. Hope it helps. – acontell Jan 02 '15 at 08:56
  • 1
    @acontell I tested this solution on an array of 10,000 random length (between 2 and 5) arrays where all the values are objects ( which look like `{"_id": , "level": , "language": "eng"}` ) and it completed in less than a half second. My solution ( even after replacing the object equivalence test with an _id check ) took about a minute with the same result (only 6 duplicates). I then tested your solution on an array of 100,000 arrays and it took about a second (363 duplicates). I'm comfortable saying this is the best solution. – NoOutlet Jan 02 '15 at 18:00
  • @NoOutlet Thanks for the info, much appreciated. All the best. – acontell Jan 02 '15 at 18:04
0
*** 
* Turns out the OP has objects in his list, so this approach won't
* work in that case. I'll leave this for future reference.
***

var foo = [['a','b'],['a','c'],['b','a'],['b','c'],['c','a'],['c','b']];

function removeRepeats(list) {
    var i;
    var b = [];
    var _c = [];

    for (i = 0; i < list.length; i++) {
        var a = list[i].sort();
        var stra = a.join("-");

        if(_c.indexOf(stra) === -1) {
            b.push(a);
            _c.push(stra);
        }
    }

    return b;
}

console.log(removeRepeats(foo));

It's not the most pretty code I've ever produced, but it should be enough to get you started I guess. What I'm doing is creating two new arrays, b and _c. b will be the array without the repeats. _c is a helper array which contains all the unique pairs already processed as a string, so I can do easy string comparisons while looping through list.

Bjorn
  • 5,272
  • 1
  • 24
  • 35
  • 1
    I invite you to try this on a starting "foo" array containing "many tens of thousands" of entries. Each of those calls to `.indexOf()` are going to take longer and longer (assuming the array doesn't start off with almost all duplicates). – Pointy Jan 01 '15 at 21:34
  • It could be easily replaced with a `for` (red: it said `while` before, but turns out performance of while is even worse, see link below) loop if performance turns out to be unacceptable. – Bjorn Jan 01 '15 at 21:36
  • A `while` loop will suffer from exactly the same issue; it's an algorithmic problem. – Pointy Jan 01 '15 at 21:37
  • Based on this answer http://stackoverflow.com/questions/6682951/why-is-looping-through-an-array-so-much-faster-than-javascripts-native-indexof `indexOf` does a bunch of extra stuff which we don't need so maybe the performance gain is already sufficient. – Bjorn Jan 01 '15 at 21:38
  • Also note that he clarified that the values are objects, so it's going to take something much more involved. – Pointy Jan 01 '15 at 21:38
  • If a,b etc. are objects things are going to get much worse. This wasn't specified in the original question. – Bjorn Jan 01 '15 at 21:40