3

I have a question that is very similar to algorithm to find longest non-overlapping sequences.

The only difference to the linked question is that instead of finding the set of non-overlapping tuples that represent the longest sequence, I need to find the set of non-overlapping tuples that represent the maximum coverage, by which I mean the sum of the tuple lengths is maximum (a tuple length being last - first + 1 given the definition of tuple in the next sentence).

I represent my tuples differently than the linked problem. Instead of (starting index, length), I represent my tuples as (first, last); for example, the tuple (3,7) represents the set of numbers [3, 4, 5, 6, 7]. (A tuple overlaps another tuple even if the end-points match; i.e., (2,6) and (6,8) overlap and therefore cannot both appear in the solution.)

As an example, consider the following set of tuples, sorted by first:

[(0,100), (2,50), (30,150), (60,95), (110,190), (120,150), (191,200)]

The maximal set here would be [(0,100), (110,190), (191,200)] with a coverage of 101 + 81 + 10 = 192. (Note that the tuples in this solution are non-overlapping.)

What is an example of the least complex algorithm to solve this, and what is the complexity of that algorithm? (It would be just great if this could be solved in O(N), but I don't know at the moment if it can be.)

ADDENDUM: In retrospect, it turns out that the question I am asking here is equivalent to the weighted interval scheduling problem. This is a special case of the interval scheduling problem.

@j_random_hacker's answer, below, is, in fact, the known solution to the weighted interval scheduling problem, with a complexity in time of O(N log(N)).

Community
  • 1
  • 1
Dan Nissenbaum
  • 13,558
  • 21
  • 105
  • 181
  • 3
    Regarding closing this question as "too broad" - I note that the linked question, of equal "broad"-ness (and numerous others with these tags), are not closed. – Dan Nissenbaum Jun 03 '14 at 22:35

2 Answers2

4

Here is an O(nlog n)-time, O(n)-space algorithm. First, sort the array of tuples by their starting position if they are not already in this order. I'll assume zero-based array indices.

Let's call the beginning position of tuple i b(i) and the ending position e(i), so that its total length is e(i) - b(i) + 1. Also let's define a function next(i) that returns the position within the tuple list of the first tuple that can appear to the right-hand side of tuple i. Notice that next(i) can be calculated in O(log n) time with a binary search: just keep all the tuple beginning positions b(i) in an array b[], and search for the first j in the subarray b[i+1 .. n-1] having b[j] > e(i).

Let's define f(i) to be the maximum coverage of any nonoverlapping set of tuples that begins at or after tuple i. Since tuple i itself is either in this optimal set or not, we have:

f(i) = max(e(i) - b(i) + 1 + f(next(i)), f(i+1)) for 0 <= i < n

We also have the boundary condition f(n) = 0.

Clearly the largest possible coverage is given by f(0). This is easily calculated. In pseudo-C++:

int b[] = /* Tuple beginning positions, in nondecreasing order */;
int e[] = /* Tuple end positions */;
int n = /* Number of tuples */;

// Find the array position of the leftmost tuple that begins to the right of
// where tuple i ends.
int next(int i) {
    return upper_bound(b + i + 1, b + n, e[i]);
}

int maxCov[n + 1];    // In practice you should dynamically allocate this

// After running this, maxCov[i] will contain the maximum coverage of any
// nonoverlapping subset of the set of n - i tuples whose beginning positions
// are given by b[i .. n-1] and whose ending points are given by e[i .. n-1].
// In particular, maxCov[0] will be the maximum coverage of the entire set.
void calc() {
    maxCov[n] = 0;
    for (int i = n - 1; i >= 0; --i) {
        maxCov[i] = max(e[i] - b[i] + 1 + maxCov[next(i)], maxCov[i + 1]);
    }
}

The loop in calc() runs n times, and each iteration makes one O(log n) call to the binary search function upper_bound().

We can reconstruct an actual set of this size by calculating both the inputs to max() for f(0), seeing which one actually produced the maximum, recording whether it implies the presence or absence of tuple 0, and then recursing to handle the remainder (corresponding to either f(next(0)) or f(1)). (If both inputs are equal then there are multiple optimal solutions and we can follow either one.)

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
  • @FuzzyTree: Thanks! Actually I now see it's extremely similar to another answer I wrote not long ago, although my thought process was very different :) – j_random_hacker Jun 04 '14 at 04:37
  • 3
    j_random_hacker and @FuzzyTree - I appreciate your help and these answers. I think that the problem is equivalent to the **weighted interval scheduling problem** (http://farazdagi.com/blog/2013/weighted-interval-scheduling/ - a special case of interval scheduling, https://en.wikipedia.org/wiki/Interval_scheduling). This problem has a known O(n log(n)) (in time) solution. It seems that the algorithm in this answer is equivalent to that solution. Thanks! – Dan Nissenbaum Jun 04 '14 at 15:41
  • @DanNissenbaum: No problem. Yes, it looks to be the same problem, and the first link you gave seems to solve it the same way (except building up from first to last instead of the other way round). – j_random_hacker Jun 04 '14 at 21:55
2

The algorithm below works by recursively retrieving the largest non-overlapping set that each element is the leftmost member of and then returning the one with the largest coverage. See comments in the code.

Implemented in PHP. You can test it here http://viper-7.com/xowTRF

I think this algorithm's complexity is O(2^N) or O(N^2) with caching, feel free to leave a comment if you disagree.

$set = [[0,100], [2,50], [30,150], [60,95], [110,190], [120,150], [191,200]];
$GLOBALS['cache'] = array(); //cache for overlapping sub problems

function maximumSet($set) {

    if(count($set) === 1) {
        return $set;
    }

    $cache_key = [];

    foreach($set as $k) {
        $cache_key[] = implode($k,'.');
    }

    $cache_key = implode('_',$cache_key);

    if(isset($GLOBALS['cache'][$cache_key])) {
        return $GLOBALS['cache'][$cache_key];
    }

    $maximumResult = null;

    //for each element in the set,
    //get the largest non-overlapping set that the element is a member of
    //once all N sets have been computed, return the largest one
    foreach($set as $k => $element) {

        unset($set[$k]);

        //create a new set $copy, which contains the remaining elements that
        //do not overlap with $element            
        $copy = $set;

        foreach($set as $k2 => $element2) {
            if($element2[0] <= $element[1]) { 
                //element is considered non overlapping if its start 
                //is less than or equal to current element's end
                unset($copy[$k2]);
            }
            else break; //because the list is sorted we can break the 1st time
            //see a non-overlapping element
        }

        if(!empty($copy)) {
            //if there is at least one non-overlapping element
            //recursively get the maximum set
            $maximumSubset = maximumSet($copy);
            //prepend the current element to it
            array_unshift($maximumSubset,$element);
        }
        else {
            //otherwise the maximum non-overlapping set which contains this element
            //is the element itself                
            $maximumSubset = [$element];
        }

        //store the set in the results by coverage
        $coverage = getCoverage($maximumSubset);
        if(is_null($maximumResult) || $maximumResult['coverage'] < $coverage) {
            $maximumResult = [
                'coverage' => $coverage,
                'set' => $maximumSubset
            ];
        }
    }

    $GLOBALS['cache'][$cache_key] = $maximumResult['set'];
    return $maximumResult['set'];
}

function getCoverage($set) {
    $range = 0;
    foreach($set as $v) {
        $range += ($v[1] - $v[0]);
    }
    return $range;
}

$x = maximumSet($set);
print "<pre>";
print_r($x);
print "</pre>";
FuzzyTree
  • 32,014
  • 3
  • 54
  • 85
  • A few things: (1) The condition you're testing (whether `$element2` starts before `$element` ends) needs to be flipped around to check that `$element2` *ends* before `$element` *starts*; (2) You never add elements that are to the right of `$element`; (3) If you fixed these problems, your algorithm would be exponential-time, since e.g. if the problem instance has n non-overlapping intervals, then `maximumSet()` will eventually be called with all 2^n - 1 possible nonempty subsets of intervals. – j_random_hacker Jun 04 '14 at 01:31
  • @j_random_hacker The initial array is sorted by starting time and an element's ending time is always greater than its starting time, since `$element2` comes after `$element`, `$element2` couldn't possibly end before `$element` starts? However, `$element2` could start before `$element` ends and that's what I'm checking for. – FuzzyTree Jun 04 '14 at 01:38
  • "`$element2` comes after `$element`" isn't true in general -- e.g. suppose `$k == 2` and `$k2 == 1`. (I don't know PHP but I assume `$k` and `$k2` are array indices.) – j_random_hacker Jun 04 '14 at 01:43
  • @j_random_hacker note the lines `unset($set[$k]); $copy = $set;` (unset removes the element from the set). Since $set is sorted by its element's starting times, $element2 will always be after $element. – FuzzyTree Jun 04 '14 at 01:46
  • Ah, I see now, thanks. In that case the test is fine, though the description at the start would be better changed to "recursively retrieving the largest non-overlapping set that each element is the *leftmost* member of". – j_random_hacker Jun 04 '14 at 01:56
  • 1
    I'll also add that this, along with the memoisation you've added, means that the algorithm is now poly-time (it's at least O(n^3), though may be more due to the copying around of size-n sets). This time complexity would be clearer (and the code faster! :) ) if `maximumSet()` took a single integer parameter `start`, indicating the array position within the global `$set` to start looking, instead of a set itself. Then it's obvious that it can only be called with n distinct parameter values, and it only does O(n^2) non-recursive work per call, for O(n^3) total. – j_random_hacker Jun 04 '14 at 02:00
  • If you add a preprocessing step in which you calculate, for each array position i, what the position j > i is of the first (leftmost) tuple to the right of tuple i that does not overlap it (this can be done in O(n^2) time), then by also switching to using a single integer `start` parameter you can bring the complexity down to O(n^2), since you no longer need the inner loop, just an array lookup :) – j_random_hacker Jun 04 '14 at 02:20
  • @j_random_hacker both good suggestions, thanks. I'll implement them when I get more time – FuzzyTree Jun 04 '14 at 02:29
  • One more idea :) That preprocessing step can actually be done in O(nlog n) time by doing n binary searches, instead of n linear searches. This gives me an idea for an overall O(nlog n) algorithm :) – j_random_hacker Jun 04 '14 at 02:41