algorithm to find longest non-overlapping sequences

Question

I am trying to find the best way to solve the following problem. By best way I mean less complex.

As an input a list of tuples (start,length) such:

[(0,5),(0,1),(1,9),(5,5),(5,7),(10,1)]

Each element represets a sequence by its start and length, for example (5,7) is equivalent to the sequence (5,6,7,8,9,10,11) - a list of 7 elements starting with 5. One can assume that the tuples are sorted by the start element.

The output should return a non-overlapping combination of tuples that represent the longest continuous sequences(s). This means that, a solution is a subset of ranges with no overlaps and no gaps and is the longest possible - there could be more than one though.

For example for the given input the solution is:

[(0,5),(5,7)] equivalent to (0,1,2,3,4,5,6,7,8,9,10,11)

is it backtracking the best approach to solve this problem ?

I'm interested in any different approaches that people could suggest.

Also if anyone knows a formal reference of this problem or another one that is similar I'd like to get references.

BTW - this is not homework.

Edit

Just to avoid some mistakes this is another example of expected behaviour

for an input like [(0,1),(1,7),(3,20),(8,5)] the right answer is [(3,20)] equivalent to (3,4,5,..,22) with length 20. Some of the answers received would give [(0,1),(1,7),(8,5)] equivalent to (0,1,2,...,11,12) as right answer. But this last answer is not correct because is shorter than [(3,20)].

What exactly do you mean by longest continuous lines? Is a single line of length 7 better than two separated ones of length 5 each (e.g. [(0,5), (7,5)] )? — Rafał Dowgird, Jan 04 '11 at 12:53
By longest I mean that for instance [(0,100)] is longer than [(0,10),(10,5)]. Since the first one covers the sequence (0,1,2,...,99) and the second one just (0,1,..,9,10,..,14). — Manuel Salvadores, Jan 04 '11 at 14:20
@msalvadores: So you don't mean "continuous" at all, right? Just the total number of numbers represented? In that case I suggest removing the word "continuous" from your question and rewording slightly. — j_random_hacker, Jan 04 '11 at 15:51
@msalvadores: So the answer must be a subset of the ranges that not only have no overlaps but also no "gaps"? The word "lines" (plural) confused me, suggesting that a single answer could contain multiple non-adjacent "lines", each made up of adjacent ranges from the input. — j_random_hacker, Jan 04 '11 at 17:46

Luke Hutteman · Accepted Answer · 2011-01-04T20:09:09.397

13

Iterate over the list of tuples using the given ordering (by start element), while using a hashmap to keep track of the length of the longest continuous sequence ending on a certain index.

pseudo-code, skipping details like items not found in a hashmap (assume 0 returned if not found):

int bestEnd = 0;
hashmap<int,int> seq // seq[key] = length of the longest sequence ending on key-1, or 0 if not found
foreach (tuple in orderedTuples) {
    int seqLength = seq[tuple.start] + tuple.length
    int tupleEnd = tuple.start+tuple.length;
    seq[tupleEnd] = max(seq[tupleEnd], seqLength)
    if (seqLength > seq[bestEnd]) bestEnd = tupleEnd
}
return new tuple(bestEnd-seq[bestEnd], seq[bestEnd])

This is an O(N) algorithm.

If you need the actual tuples making up this sequence, you'd need to keep a linked list of tuples hashed by end index as well, updating this whenever the max length is updated for this end-point.

UPDATE: My knowledge of python is rather limited, but based on the python code you pasted, I created this code that returns the actual sequence instead of just the length:

def get_longest(arr):
    bestEnd = 0;
    seqLengths = dict() #seqLengths[key] = length of the longest sequence ending on key-1, or 0 if not found
    seqTuples = dict() #seqTuples[key] = the last tuple used in this longest sequence
    for t in arr:
        seqLength = seqLengths.get(t[0],0) + t[1]
        tupleEnd = t[0] + t[1]
        if (seqLength > seqLengths.get(tupleEnd,0)):
            seqLengths[tupleEnd] = seqLength
            seqTuples[tupleEnd] = t
            if seqLength > seqLengths.get(bestEnd,0):
                bestEnd = tupleEnd
    longestSeq = []
    while (bestEnd in seqTuples):
        longestSeq.append(seqTuples[bestEnd])
        bestEnd -= seqTuples[bestEnd][1]
    longestSeq.reverse()
    return longestSeq


if __name__ == "__main__":
    a = [(0,3),(1,4),(1,1),(1,8),(5,2),(5,5),(5,6),(10,2)]
    print(get_longest(a))

edited Jan 04 '11 at 20:09

answered Jan 04 '11 at 16:52

Luke Hutteman

1,921
13
11

Thanks, it works. But just to give me the full sequence. I am actually looking for the actual tuples that form the solution. Here you a simple Pythonic version of your algorithm http://paste.ideaslabs.com/show/hIicj5cQdg - Thanks !!! – Manuel Salvadores Jan 04 '11 at 17:55
1

Like I said in my answer, keeping a linked list of the actual tuples that make up this sequence should be relatively easy. Whenever you find a longer sequence, simply take the previous linked list and add the new tuple to it. You could either store these lists of tuples in a seperate hashmap, or extend the existing one for this purpose, making the value a class that combines both the max length as well as a linked list of tuples making up this max – Luke Hutteman Jan 04 '11 at 18:03
2

The update is correct and a great answer. http://paste.ideaslabs.com/show/uOR5k0db5 tweaks the updated Python algorithm to handle an edge case and reverses the output to start with the lowest tuple first. – orangepips Jan 04 '11 at 19:46
good catch @orangepips - I updated the code to reflect your changes – Luke Hutteman Jan 04 '11 at 20:09
@Luke Hutterman: also change `while(bestEnd in seqTuples):` to `while (bestEnd != 0 and bestEnd in seqTuples):` – orangepips Jan 04 '11 at 20:10
@orangepips: I don't think the `!= 0` is necessary anymore; we can simply iterate until the current index is no longer in `seqTuples`. Also, the `!= 0` is potentially harmful in case the input contains negative start indices, like with `a = [(-1,1),(0,2)]` – Luke Hutteman Jan 04 '11 at 20:18
@Luke Hutteman: sorry, you're correct, wasn't thinking clearly. – orangepips Jan 04 '11 at 20:23
+1. Hashmaps are "for free"? Unless they're for free, the algorithm is not O(N) because the time taken to add and lookup stuff to the hashmap is not taken into account. Anyway, whatever cost a hashmap has, it's better then binary lookup (that's log(N)) so it's faster then mine, +1. – Cosmin Prund Jan 05 '11 at 06:54
@LukeHutteman won't this algo give wrong answer if the input tuples are `not sorted` by start time ? – Aseem Goyal Dec 02 '13 at 05:58
@aseem: Yes, but the question explicitly stated "One can assume that the tuples are sorted by the start element." so I chose to use that in my algorithm. – Luke Hutteman Apr 04 '14 at 17:47

Handcraftsman · Answer 2 · 2011-01-04T18:34:10.690

Revised algorithm:

create a hashtable of start->list of tuples that start there
put all tuples in a queue of tupleSets
set the longestTupleSet to the first tuple
while the queue is not empty
    take tupleSet from the queue
    if any tuples start where the tupleSet ends
        foreach tuple that starts where the tupleSet ends
            enqueue new tupleSet of tupleSet + tuple
        continue

    if tupleSet is longer than longestTupleSet
        replace longestTupleSet with tupleSet

return longestTupleSet

c# implementation

public static IList<Pair<int, int>> FindLongestNonOverlappingRangeSet(IList<Pair<int, int>> input)
{
    var rangeStarts = input.ToLookup(x => x.First, x => x);
    var adjacentTuples = new Queue<List<Pair<int, int>>>(
        input.Select(x => new List<Pair<int, int>>
            {
                x
            }));

    var longest = new List<Pair<int, int>>
        {
            input[0]
        };
    int longestLength = input[0].Second - input[0].First;

    while (adjacentTuples.Count > 0)
    {
        var tupleSet = adjacentTuples.Dequeue();
        var last = tupleSet.Last();
        int end = last.First + last.Second;
        var sameStart = rangeStarts[end];
        if (sameStart.Any())
        {
            foreach (var nextTuple in sameStart)
            {
                adjacentTuples.Enqueue(tupleSet.Concat(new[] { nextTuple }).ToList());
            }
            continue;
        }
        int length = end - tupleSet.First().First;
        if (length > longestLength)
        {
            longestLength = length;
            longest = tupleSet;
        }
    }

    return longest;
}

tests:

[Test]
public void Given_the_first_problem_sample()
{
    var input = new[]
        {
            new Pair<int, int>(0, 5),
            new Pair<int, int>(0, 1),
            new Pair<int, int>(1, 9),
            new Pair<int, int>(5, 5),
            new Pair<int, int>(5, 7),
            new Pair<int, int>(10, 1)
        };
    var result = FindLongestNonOverlappingRangeSet(input);
    result.Count.ShouldBeEqualTo(2);
    result.First().ShouldBeSameInstanceAs(input[0]);
    result.Last().ShouldBeSameInstanceAs(input[4]);
}

[Test]
public void Given_the_second_problem_sample()
{
    var input = new[]
        {
            new Pair<int, int>(0, 1),
            new Pair<int, int>(1, 7),
            new Pair<int, int>(3, 20),
            new Pair<int, int>(8, 5)
        };
    var result = FindLongestNonOverlappingRangeSet(input);
    result.Count.ShouldBeEqualTo(1);
    result.First().ShouldBeSameInstanceAs(input[2]);
}

@Handcraftsman I don't think this algorithm would be right for the following case. Input [(0,2),(2,8),(3,50)] in this case the right answer is [(3,50)] because it contains the longest sequence - 50 elements in this case. — Manuel Salvadores, Jan 04 '11 at 15:47
Huh. I clearly misunderstood the question. I'll have to think on this more. — Handcraftsman, Jan 04 '11 at 16:03
@Handcraftsman thanks ! easy to read algorithm. I could interpret you pseudo-code and write a python version that works nicely !! http://paste.ideaslabs.com/show/65N44moAqJ any comments on complexity ? — Manuel Salvadores, Jan 04 '11 at 19:42
@msalvadores you might want to add a python 'continue' equivalent at the end of the 'if (tupleSet[-1] ...' block . Tuples being added to the queue may not be at their max length yet so no reason to incur the potential cost of comparing them to the longest item. — Handcraftsman, Jan 04 '11 at 19:53
While this approach would work, it has worst case exponential complexity as for every tupleset you dequeue, you potentially enqueue several new ones. — Luke Hutteman, Jan 05 '11 at 14:06

score 2 · Answer 3 · answered Jan 05 '11 at 08:11

This is a special case of the longest path problem for weighted directed acyclic graphs.

The nodes in the graph are the start points and the points after the last element in a sequence, where the next sequence could start.

The problem is special because the distance between two nodes must be the same independently of the path.

Nylon Smile · Answer 4 · 2011-01-05T02:52:52.693

I removed the previous solution because it was not tested.

The problem is finding the longest path in a "weighted directed acyclic graph", it can be solved in linear time:

http://en.wikipedia.org/wiki/Longest_path_problem#Weighted_directed_acyclic_graphs

Put a set of {start positions} union {(start position + end position)} as vertices. For your example it would be {0, 1, 5, 10, 11, 12}

for vertices v0, v1 if there is an end value w that makes v0 + w = v1, then add a directed edge connecting v0 to v1 and put w as its weight.

Now follow the pseudocode in the wikipedia page. since the number of vertices is the maximum value of 2xn (n is number of tuples), the problem can still be solved in linear time.

Hi, I don't think your answer takes into account the fact that there could be more than one triple starting in the same index. If so please clarify better the construction of the table "hash keys" - thanks !!! — Manuel Salvadores, Jan 04 '11 at 19:00

Cosmin Prund · Answer 5 · 2011-01-04T19:23:57.903

Edited to replace pseudocode with actual Python code

Edited AGAIN to change the code; The original algorithm was on the solution, but I missunderstood what the second value in the pairs was! Fortunatelly the basic algorithm is the same, and I was able to change it.

Here's an idea that solves the problem in O(N log N) and doesn't use a hash map (so no hidden times). For memory we're going to use N * 2 "things".

We're going to add two more values to each tuple: (BackCount, BackLink). In the successful combination BackLink will link from right to left from the right-most tuple to the left-most tuple. BackCount will be the value accumulated count for the given BackLink.

Here's some python code:

def FindTuplesStartingWith(tuples, frm):
    # The Log(N) algorithm is left as an excersise for the user
    ret=[]
    for i in range(len(tuples)):
        if (tuples[i][0]==frm): ret.append(i)
    return ret

def FindLongestSequence(tuples):

    # Prepare (BackCount, BackLink) array
    bb=[] # (BackCount, BackLink)
    for OneTuple in tuples: bb.append((-1,-1))

    # Prepare
    LongestSequenceLen=-1
    LongestSequenceTail=-1

    # Algorithm
    for i in range(len(tuples)):
        if (bb[i][0] == -1): bb[i] = (0, bb[i][1])
        # Is this single pair the longest possible pair all by itself?
        if (tuples[i][1] + bb[i][0]) > LongestSequenceLen:
            LongestSequenceLen = tuples[i][1] + bb[i][0]
            LongestSequenceTail = i
        # Find next segment
        for j in FindTuplesStartingWith(tuples, tuples[i][0] + tuples[i][1]):
            if ((bb[j][0] == -1) or (bb[j][0] < (bb[i][0] + tuples[i][1]))):
                # can be linked
                bb[j] = (bb[i][0] + tuples[i][1], i)
                if ((bb[j][0] + tuples[j][1]) > LongestSequenceLen):
                    LongestSequenceLen = bb[j][0] + tuples[j][1]
                    LongestSequenceTail=j

    # Done! I'll now build up the solution
    ret=[]
    while (LongestSequenceTail > -1):
        ret.insert(0, tuples[LongestSequenceTail])
        LongestSequenceTail = bb[LongestSequenceTail][1]
    return ret

# Call the algoritm
print FindLongestSequence([(0,5), (0,1), (1,9), (5,5), (5,7), (10,1)])
>>>>>> [(0, 5), (5, 7)]
print FindLongestSequence([(0,1), (1,7), (3,20), (8,5)])    
>>>>>> [(3, 20)]

The key for the whole algorithm is where the "THIS IS THE KEY" comment is in the code. We know our current StartTuple can be linked to EndTuple. If a longer sequence that ends at EndTuple.To exists, it was found by the time we got to this point, because it had to start at an smaller StartTuple.From, and the array is sorted on "From"!

Phil H · Answer 6 · 2011-01-11T17:26:14.940

This is a simple reduce operation. Given a pair of consecutive tuples, they either can or can't be combined. So define the pairwise combination function:

def combo(first,second):
    if first[0]+first[1] == second[0]:
        return [(first[0],first[1]+second[1])]
    else:
        return [first,second]

This just returns a list of either one element combining the two arguments, or the original two elements.

Then define a function to iterate over the first list and combine pairs:

def collapse(tupleList):
    first = tupleList.pop(0)
    newList = []
    for item in tupleList:
        collapsed = combo(first,item)
        if len(collapsed)==2:
            newList.append(collapsed[0])
        first = collapsed.pop()
    newList.append(first)
    return newList

This keeps a first element to compare with the current item in the list (starting at the second item), and when it can't combine them it drops the first into a new list and replaces first with the second of the two.

Then just call collapse with the list of tuples:

>>> collapse( [(5, 7), (12, 3), (0, 5), (0, 7), (7, 2), (9, 3)] )
[(5, 10), (0, 5), (0, 12)]

[Edit] Finally, iterate over the result to get the longest sequence.

def longest(seqs):
    collapsed = collapse(seqs)
    return max(collapsed, key=lambda x: x[1])

[/Edit]

Complexity O(N). For bonus marks, do it in reverse so that the initial pop(0) becomes a pop() and you don't have to reindex the array, or move the iterator instead. For top marks make it run as a pairwise reduce operation for multithreaded goodness.

to run it with Python, it should be combo([first,item]) instead of combo(first,item) in the collapse function. — Manuel Salvadores, Jan 04 '11 at 14:32
I've fixed that, thanks. combo now takes two tuple arguments rather than a list. — Phil H, Jan 04 '11 at 14:34
@msalvadores: Have added a quick function to find the longest of the collapsed sequences. I forgot the final step! Assuming that you want the collapsed sequence, not the expanded sequence. — Phil H, Jan 11 '11 at 17:30

score 1 · Answer 7 · answered Jan 04 '11 at 17:11

Just thinking about the algorithm in basic terms, would this work?

(apologies for horrible syntax but I'm trying to stay language-independent here)

First the simplest form: Find the longest contiguous pair.

Cycle through every member and compare it to every other member with a higher startpos. If the startpos of the second member is equal to the sum of the startpos and length of the first member, they are contiguous. If so, form a new member in a new set with the lower startpos and combined length to represent this.

Then, take each of these pairs and compare them to all of the single members with a higher startpos and repeat, forming a new set of contiguous triples (if any exist).

Continue this pattern until you have no new sets.

The tricky part then is you have to compare the length of every member of each of your sets to find the real longest chain.

I'm pretty sure this is not as efficient as other methods, but I believe this is a viable approach to brute forcing this solution.

I'd appreciate feedback on this and any errors I may have overlooked.

ivy · Answer 8 · 2011-01-04T12:42:55.053

This sounds like a perfect "dynamic programming" problem...

The simplest program would be to do it brute force (e.g. recursive), but this has exponential complexity.

With dynamic programming you can set up an array a of length n, where n is the maximum of all (start+length) values of your problem, where a[i] denotes the longest non-overlapping sequence up to a[i]. You can then step trought all tuples, updating a. The complexity of this algorithm would be O(n*k), where k is the number of input values.

Aliostad · Answer 9 · 2011-01-04T13:19:36.187

0

Create an ordered array of all start and end points and initialise all of them to one
For each item in your tuple, compare the end point (start and end) to the ordered items in your array, if any point is between them (e.g. point in the array is 5 and you have start 2 with length 4) change value to zero.
After finishing the loop, start moving across the ordered array and create a strip when you see 1 and while you see 1, add to the existing strip, with any zero, close the strip and etc.
At the end check the length of strips

I think complexity is around O(4-5*N)

(SEE UPDATE)

with N being number of items in the tuple.

UPDATE

As you figured out, the complexity is not accurate but definitely very small since it is a function of number of line stretches (tuple items).

So if N is number of line stretches, sorting is O(2N * log2N). Comparison is O(2N). Finding line stretches is also O(2N). So all in all O(2N(log2N + 2)).

edited Jan 04 '11 at 13:19

answered Jan 04 '11 at 12:45

Aliostad

80,612
21
160
208

Smart, there's only a limited amount of endpoints. But how do you create an ordered array of all start & endpoints in linear time? I would at least expect some O(n log n) for that (the start points are sorted, but worse case the endpoints only decrease or only increase). And while updating all points (O(n)), looking up a point in the sorted array probably takes you log n as well... – ivy Jan 04 '11 at 12:54
2

Negative complexity class? w00t! – moinudin Jan 04 '11 at 12:57
@Ivy you are right. I did not consider sorting so we will have **O(2N * log2N)** just for sorting. I will update my answer. – Aliostad Jan 04 '11 at 13:13
I think that the second point would set 0 on the (0,5) from the original example (as 1 from (0,1) falls in between), which looks like it might lead to the wrong solution. – Rafał Dowgird Jan 04 '11 at 14:35

algorithm to find longest non-overlapping sequences

9 Answers9

(SEE UPDATE)

Linked