Combining different number ranges in O(n)

Question

I'm currently tracking user play times of videos, and I'm trying to determine the % of a video a user watches. I've generalised the problem to given a series of number ranges that potentially overlap, how to combine them into a series of non-overlapping number ranges (i.e. converting "0-10, 5-15, 30-45, 20-25" into "0-15, 20-25, 30-45".

I have a relatively long-winded solution based on the premise that if the number ranges are sorted, then it is relatively trivial to combine two adjacent number ranges (either combine them if they overlap or they remain separate). Thus, we sort the number ranges first then iterate through the ranges and combining them.

Since sorting is worst case O(nlgn), this means my solution should be O(nlgn), and I was wondering if anyone knows of a O(n) solution to the problem?

http://jsfiddle.net/457PH/2

var testcase = [
        [0, 30], [40, 50], [5, 15], [70, 95], [45, 75], [0, 10],
        [110, 115], [115, 120], [140, 175], [125, 160]
    ];

//sorts the array in ascending order (based on first element)
//if the first elements are the same, order based on second element (prioritising elements that are bigger)
testcase.sort(function(a, b) {
    if (a[0] !== b[0]) return a[0] - b[0];

    return b[1] - a[1]
})


function evaluate(a, b) {
    var result = [];
    //tests that the array is sorted properly
    if ((a[0] > b[0]) || ((a[0] === b[0] ) && (a[1] < b[1]))) throw new Error('Array not sorted properly');

    //if a and b do not overlap, then push both in the result
    if(b[0] > a[1]) {
        result.push(a, b);
    }
    //if a and b overlap
    else {
        var newElement = [a[0], Math.max(a[1], b[1])];
        result.push(newElement);
    }
    return result;
}

console.log(testcase)
var combinedArr = [testcase[0]];
for (var i = 1; i < testcase.length; i++) {
    var popped = combinedArr.pop();
    combinedArr = combinedArr.concat(evaluate(popped, testcase[i]));
}
console.log(combinedArr);

What is the scale of the problem? It seems like you won't have more than a dozen elements in each list (from the description of the problem, correct me if I am wrong). For such a small size there is very little meaning to asymptotic complexity, and it is not unlikely that insertion sort, that is O(n^2) on paper, will achieve better performance than other sorting techniques. — amit, Mar 05 '14 at 16:52
In addition `sorting is worst case O(nlgn)` - this depends on the sort algorithm implemented. Quicksort, for example has O(n^2) worst case, while others (mergsort, for example) have O(nlogn). — amit, Mar 05 '14 at 16:53
No, that's impossible. If you assume your input to be unsorted, and want your output to be ascending, then you won't get around sorting. — Bergi, Mar 05 '14 at 17:12
@amit The scale of the problem isn't likely to be large, and agree that there isn't much of a performance improvement. Having said that, I would like to use this opportunity to improve my knowledge of algorithms, and thus wanted to see if SO experts had any tips. — Dan Tang, Mar 05 '14 at 17:23
@KarolyHorvath: An interval tree can do nothing more than a sorted list of intervals, except updates, which OP doesn't need — Niklas B., Mar 05 '14 at 21:13

amit · Accepted Answer · 2014-03-05T17:28:37.613

An alternative solution that is O(W+n*|S|) where |S| is the average size of each interval and W is the maximal value in the list will be using a bitset, and iterate each element and set all relevant bits.
In another iteration - print all intervals in the bitset (which is sorted).

So, the algorithm for this approach is basically:

Create a bitset of size W where a bit is set only if it is in some interval.
Iterate the bitset and print the intervals - this is fairly easy now.

While this could be much worse in terms of asymptotic complexity if W or |S| are large - note that the constants here are fairly small, since bit operations are fairly easy to implement.

Choosing which is actually better should be done using empirical benchmark and achieving statistical significance.

Pseudo-code:

//create the bitset:
b <- new bitset
for each interval [x1,x2]:
  for each element i from x1 to x2:
     b[i] = 1

//print intervals:
first <- -1
for each element i from 0 to W+1: //regard b[W] as 0
  if b[i] == 0 and first != -1:
     print (first,i-1)
     first = -1
  else if b[i] == 1 and first == -1:
     first = i

(+1) I thought along similar lines...to the extent that one could use a number as the bitset, could the first part be something like: `bitset | (Math.pow(2,top - bottom) - 1) << (bottom - 1))`. In JavaScript, one would need to customize a number larger than 32 bits; and larger powers too, come to think of it. — גלעד ברקן, Mar 05 '14 at 20:21
@גלעדברקן: You can also do `((1 << (top - bottom + 1)) - 1) << bottom` — Niklas B., Mar 05 '14 at 20:37

score 0 · Answer 2 · answered Mar 05 '14 at 20:43

If you just restrict to the case where each of the first half of the intervals are overlapping a distinct member of the second half of the intervals, then the number of possibilities for overlapping combinations of intervals is at least Omega((n/2)!) (i.e. n/2 factorial). Thus, in any comparison based algorithm, you will need at least log((n/2)!) = Omega(n log n) comparisons to distinguish between all these cases. Thus, in any comparison based algorithm, you will need Omega(n log n) time in the worst case.

score 0 · Answer 3 · answered Mar 06 '14 at 04:29

Here's an attempt at the bitset implementation in JavaScript:

function percentWatched(ranges,totalMinutes){
    var numPartitions = Math.ceil(totalMinutes / 31),
        bitset = new Array(numPartitions)

    for (var i in ranges){
        var top = ranges[i][1]
          , bottom = ranges[i][0]
          , m, shift, k

          while (bottom < top){
              m = Math.floor(bottom / 31)
              shift = bottom % 31
              k = shift + top - bottom <= 31 ? top - bottom : 31 - shift
              bitset[m] |= (1 << k) - 1 << shift
              bottom += k
          }
    }

    var minutesWatched = 0
    for (var i in bitset)
        minutesWatched += numSetBits(bitset[i])

    return {percent: 100 * minutesWatched / totalMinutes
           , ranges: bitset}
}

function numSetBits(i) //copied from http://stackoverflow.com/questions/109023
{
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

Console output:

> var a = percentWatched([[0,10], [5,15], [30,45], [20,25]],100)

> for (var i in a.ranges) console.log(a.ranges[i].toString(2))
"1000001111100000111111111111111"
"11111111111111"

> a.percent
35

Combining different number ranges in O(n)

3 Answers3