2

Apologies about the vague titles, wasn't really sure how else to describe the problem.

I recently ran into a case where I have to loop through an array of objects to compare multiple value, I opted for using a for loop in a for loop to compare each object with every other object.

While this works fine on small arrays, once my array gets a little bigger (say 10,000 objects) performance tends to be a big issue.

This array contains these kind of objects:

[{ char: '_', from: 0, to: 2, chrLength: 2 },
{ char: '_', from: 0, to: 7, chrLength: 7 },
{ char: 'a', from: 1, to: 3, chrLength: 2 },
{ char: 'a', from: 1, to: 6, chrLength: 5 },
{ char: '_', from: 2, to: 7, chrLength: 5 },
{ char: 'a', from: 3, to: 6, chrLength: 3 }]

The idea is that I can only select objects where the from and to don't overlap with any other object. (from and to are indexes in another array)

So for the example array, possible outcomes would be:

[{ char: '_', from: 0, to: 2, chrLength: 2 },
 { char: 'a', from: 1, to: 3, chrLength: 2 },
 { char: 'a', from: 1, to: 6, chrLength: 5 },
 { char: 'a', from: 3, to: 6, chrLength: 3 }]

The way I handled it was as follows:

var canUse = true,
    posibilities = [];
for(i = 0; i < l; i++) {
    canUse = true;
    for(var j = 0; j < l; j++) {
        if((results[i].from < results[j].from && results[i].to > results[j].to)) {
            canUse = false;
            break;
        }
    }

    if(canUse) posibilities.push(results[i]);
}

Seeing performance is quite terrible with larger arrays, I'm wondering if there is a better solution to do this?

woutr_be
  • 9,532
  • 24
  • 79
  • 129

3 Answers3

1

Start by sorting the objects on the chrLength property. When you look for objects that keep an object from being included, you only have to check the objects that are at least two characters shorter.

results.sort(function(x, y){ return x.chrLength - y.chrLength; });

var posibilities = [];
for (var i = 0; i < l; i++) {
  var canUse = true, len = results[i].chrLength - 2;
  for (var j = 0; results[j].chrLength <= len; j++) {
    if((results[i].from < results[j].from && results[i].to > results[j].to)) {
      canUse = false;
      break;
    }
  }
  if(canUse) posibilities.push(results[i]);
}

With your example data, this reduces the number of checks from 36 in the in the original code to only 8.

Comparison: http://jsfiddle.net/Guffa/5jsSb/

Edit:

You can make an array where each item is an array of objects with the same chrLength, and then sort each array on the from property. That way you can easily skip to the point where the objects start to overlap, and stop comparing as soon as they don't overlap any more:

var map = [];
for (var i = 0; i < l; i++) {
  var ch = results[i].chrLength;
  while (map.length <= ch) map.push([]);
  map[ch].push(results[i]);
}
for (var i = 1; i < map.length; i++) {
  map[i].sort(function(x, y){ return x.from - y.from; });
}

var posibilities = [];
for (var i = 0; i < l; i++) {
  var canUse = true, len = results[i].chrLength - 2, from = results[i].from, to = results[i].to;
  for (var j = 1; canUse && j <= len; j++) {
    if (map[j][map[j].length - 1].from > from) {
      var k;
      for (k = 0; map[j][k].from <= from; k++);
      for (;k < map[j].length && map[j][k].from < to; k++) {
        if (map[j][k].to < to) {
          canUse = false;
          break;
        }
      }
    }
  }
  if(canUse) posibilities.push(results[i]);
}

This splits the checks for the from and the to properties in two stages, so the number of complete checks (where map[j][k].to < to is evaluated) is actually smaller than the total number of objects.

Disclaimer: Naturally you would need to verify that the code does the right thing. I have checked that the result has the same number of items, but I haven't compared each item.

Community
  • 1
  • 1
Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • Thanks, that does seem to reduce the amount of checks, but with some larger arrays it still takes quite a bit. Just wondering if there's any way to eliminate items before performing a for loop – woutr_be Jun 18 '14 at 10:41
  • @woutr_be: How does the number of checks change for larger sets? Consider that you can't ever get any better than O(n), i.e. at least one check per item. Can you show a larger example set? It's hard to find any useful patterns in a small set, that could be used for ellimination. – Guffa Jun 18 '14 at 11:22
  • Sure, for example, here's one with 1000 items http://jsfiddle.net/5jsSb/1/ And here's one more extreme: http://jsfiddle.net/5jsSb/3/ (although my final array is double the size) – woutr_be Jun 18 '14 at 11:34
  • @woutr_be: Nice, that made it easier to test some ideas. I made some improvements, and posted the code above. There are some more optimisations that might be done, but it's hard to analyse what's actually happening in the code in a fiddle, it hardly runs at all with that much data. The browser actually crashed once. :) – Guffa Jun 18 '14 at 14:47
  • Guys, I think you are on a wrong way. You need some sort of data structure that will answer the question 'if a given interval has overlaps' in O(log n) times. There is such a data structure called interval tree (google for it). It takes O(n*log n) to construct it and n times O(log n) to check every interval for overlaps. The resulting complexity will be O(n*log n). – Yury Tarabanko Jun 18 '14 at 16:00
  • @YuryTarabanko: That's possible, but this is an unusual kind of overlap where standard solutions doesn't fit straight off. For example the interval 5 to 10 doesn't have overlaps in the intervals 4 to 6, 5 to 8, 7 to 10 or 8 to 12, but in the intervals 6 to 8 and 7 to 9. – Guffa Jun 18 '14 at 16:06
  • @Guffa This fact actually makes the problem easier. It allows to eliminate 'overlaping' intervals when building the tree. – Yury Tarabanko Jun 18 '14 at 23:04
  • @YuryTarabanko: I don't think so, you need all of the intervals, as any interval can affect another interval. – Guffa Jun 18 '14 at 23:21
  • @Guffa How about this? http://jsfiddle.net/tarabyte/5jsSb/10/ If an interval is to the left of the current root it can't affect anything to the right, thats the point. You can check my answer. – Yury Tarabanko Jun 18 '14 at 23:26
1

Here is the idea (Demo):

  1. You need some sort of self-balancing tree supporting insert and remove operations of O(log n). For the sake of simplicity I've used Red-Black Tree.
  2. You need to use mid point of your interval as a key i.e. (from + to)/2.
  3. Suppose you have already "inserted" k items into your tree and are about to insert k+1. Your steps are:
  4. If k+1 covers the root - ignore it.
  5. If k+1 is covered by the root - remove the root from the tree and make another attempt.
  6. Else continue to the left or the right subtree by comparing k+1's mid to the root's one.
  7. When everything is inserted traverse the resulting tree collecting every node.
  8. Profit... I've used 4 times the array of yours by concating it with itself. The result on my machine under Chrome is 116ms (cold start) and 64ms (after warm up).

Code

 function process() {
    console.log('Processing results of length: ' + l);

    console.time('Processing');

    var comparator = function(a, b) { //Comparator to build a tree
           return a.mid - b.mid;
        },
        isAinB = function(a, b) { //util function to check if a is inside b
            return b.from < a.from && b.to > a.to;    
        },
        rbtree = new RBTree(comparator), //Build an empty tree
        i = results.length - 1, item, posibilities = [];

    function check(root, x) { //Recursive checker
        var data;        

        if(!root) { //Either tree is empty or we've reached a leaf
            rbtree.insert(x);
            return;
        }

        data = root.data;

        if(isAinB(data, x)) { //4
            return;    
        }
        if(isAinB(x, data)) { //5
            rbtree.remove(data);
            check(rbtree._root, x);
            return;
        }    

        check(root[comparator(data, x) > 0 ? 'left' : 'right'], x); //6
    }

    for(; i >= 0; i--) { 
        item = results[i];
        item.mid = (item.from + item.to)/2; //2
        check(rbtree._root, item); //3
    }

    rbtree.each(function(item) { //7
        posibilities.push(item);
    });
    console.timeEnd('Processing');

    console.log(posibilities.length);
}

BTW I've used this RBTree implementation. Not sure if it is the best one :)

Yury Tarabanko
  • 44,270
  • 9
  • 84
  • 98
  • Thank you so much for the example, I don't have any experience with trees, looking into RBTree now to see if I can re-create your demo. – woutr_be Jun 19 '14 at 02:34
  • @woutr_be Self-balancing trees are very useful data structures when you have some sort of comparison based queries. https://mitpress.mit.edu/books/introduction-algorithms chapters 12-13 – Yury Tarabanko Jun 19 '14 at 08:59
0

Well for starters, once canUse is false you don't need to carry on with the inner loop.

You can either add a break; or change the second for loop to be:

for (var j = 0; canUse && (j < l); j++)

and you'll probably see a useful speed up.

foz
  • 3,121
  • 1
  • 27
  • 21