How does one find the largest consecutive set of numbers in a list that are not necessarily adjacent?

Question

For instance, if I have a list

[1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]

This algorithm should return [1,2,3,4,5,6,7,8,9,10,11].

To clarify, the longest list should run forwards. I was wondering what is an algorithmically efficient way to do this (preferably not O(n^2))?

Also, I'm open to a solution not in python since the algorithm is what matters.

Thank you.

why not `[1,2,3,4,5,6,7,8,9,10,11]`. I see no reason these numbers are not included since they do not have to be adjacent. — Serdalis, Dec 29 '11 at 06:37
Can the longest consecutive sequence start at a number other than 1? — Josh Rosen, Dec 29 '11 at 06:40
And yes, the longest consecutive set can start from any integer. — dangerChihuahua007, Dec 29 '11 at 06:46
I'm not quite clear on what you're actually expecting this algorithm to do. Should it find the longest run of consecutive integers among the elements of the list? Should it find the longest run of integers such that both the integers and their positions in the list are in increasing order? (i.e. is this literally the [longest subsequence problem](http://en.wikipedia.org/wiki/Longest_increasing_subsequence)?) Something else? Perhaps some more sample inputs/outputs would help. Example: what is the expected result for `[5,3,6,10,13,5,2,11,15,8,15]`? Or `[7,6,5,4,1,2,3]`? — David Z, Dec 29 '11 at 07:13
@ChiZeng It looks simple enough -- move left to right, the run length for the current element is found by adding one to a lookup of the run length of the integer preceding the current element. This is a simple, one-pass O(n) algorithm. — Raymond Hettinger, Dec 29 '11 at 10:30

Raymond Hettinger · Accepted Answer · 2011-12-30T21:42:03.807

15

Here is a simple one-pass O(n) solution:

s = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11,42]
maxrun = -1
rl = {}
for x in s:
    run = rl[x] = rl.get(x-1, 0) + 1
    print x-run+1, 'to', x
    if run > maxrun:
        maxend, maxrun = x, run
print range(maxend-maxrun+1, maxend+1)

The logic may be a little more self-evident if you think in terms of ranges instead of individual variables for the endpoint and run length:

rl = {}
best_range = xrange(0)
for x in s:
    run = rl[x] = rl.get(x-1, 0) + 1
    r = xrange(x-run+1, x+1)
    if len(r) > len(best_range):
        best_range = r
print list(best_range)

edited Dec 30 '11 at 21:42

answered Dec 29 '11 at 08:29

Raymond Hettinger

216,523
63
388
485

@RaymondHettinger - should that last line be: `print range(maxend-maxrun+1, maxend+1)`? Otherwise for `s = [1,4,2,3,5,4,9,10,11,5,6,7,8,1,3,4,5]` I only get `[4, 5, 6, 7, 8]`, not `[1, 2, 3, 4, 5, 6, 7, 8]`. – PaulMcG Dec 29 '11 at 11:45
@nightcracker - did you run it and get an IndexError, or are you just running this in your head? The chained assignment works right-to-left, and rl.get has a default value of 0 passed in, so no IndexError there. And since rl[1] gets the value of 0+1=1, then it can be copied to `run`, and again, no IndexError. Try running this, it really does work correctly. – PaulMcG Dec 29 '11 at 12:34
@Paul McGuire, agree, I think it should be maxrun instead of run too. – sunqiang Dec 29 '11 at 13:09
This solution only works if the data is sorted. That is, it won't account for 1, 2, 3, 5, 6, 7, 8, 4. Entering 4 last won't update the values at 5, 6, 7, 8. So unless you take into account the sort complexity, it isn't O(n). – Brett Stottlemyer Dec 30 '11 at 20:36
@BrettStottlemyer You just have a different understanding of what question the OP was asking. My interpretation (and that of some other respondents) was that a sorted, consecutive, increasing subsequence was being sought. That reading is backed-up by the OP's requirement that "the longest list should run forwards". With that requirement, the addition of the 4 should not update the values at 5,6,7,and 8. Hence, your downvote is unwarranted and incorrect. – Raymond Hettinger Dec 30 '11 at 21:26
@Raymond Ok, I will accept that my interpretation is not what the OP is interested in and remove my downvote. Hope it doesn't confuse others into thinking this is a solution that finds the longest sequence, rather than the longest ascending sequence. – Brett Stottlemyer Dec 30 '11 at 21:36
@RaymondHettinger Sorry, thought I could remove my downvote without an edit. SO won't let me, though. – Brett Stottlemyer Dec 30 '11 at 21:38

score 3 · Answer 2 · answered Dec 29 '11 at 06:54

3

Not that clever, not O(n), could use a bit of optimization. But it works.

def longest(seq):
  result = []
  for v in seq:
    for l in result:
      if v == l[-1] + 1:
        l.append(v)
    else:
      result.append([v])
  return max(result, key=len)

answered Dec 29 '11 at 06:54

Ignacio Vazquez-Abrams

776,304
153
1,341
1,358

Actually there is no *O*(n) implementation for this :-) – Abhijit Dec 29 '11 at 07:06
This is O(n^2), as is mine. Need to think of a different approach. – jeffknupp Dec 29 '11 at 08:15
1

@Abhijit: There is, look at Raymond Hettingers's. – orlp Dec 29 '11 at 12:35

Abhijit · Answer 3 · 2011-12-29T07:18:48.980

2

You can use The Patience Sort implementation of the Largest Ascending Sub-sequence Algorithm

def LargAscSub(seq):
    deck = []
    for x in seq:
        newDeck = [x]
        i = bisect.bisect_left(deck, newDeck)
        deck[i].insert(0, x) if i != len(deck) else deck.append(newDeck)
    return [p[0] for p in deck]

And here is the Test results

>>> LargAscSub([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> LargAscSub([1, 2, 3, 11, 12, 13, 14])
[1, 2, 3, 11, 12, 13, 14]
>>> LargAscSub([11,12,13,14])
[11, 12, 13, 14]

The Order of Complexity is O(nlogn)

There was one note in the wiki link where they claimed that you can achieve O(n.loglogn) by relying on Van Emde Boas tree

edited Dec 29 '11 at 07:18

answered Dec 29 '11 at 06:48

Abhijit

62,056
18
131
204

2

Doesn't the result have to be *consecutive* integers? – srgerg Dec 29 '11 at 06:53
@srgerg, Just check the above commented question from Serdalis and Chi Zeng's reply – Abhijit Dec 29 '11 at 06:58
It's not largest ascending, it's largest consecutive ascending. – jeffknupp Dec 29 '11 at 07:27

jb. · Answer 4 · 2011-12-29T09:13:16.443

How about using a modified Radix Sort? As JanneKarila pointed out the solution is not O(n). It uses Radix sort, which wikipedia says Radix sort's efficiency is O(k·n) for n keys which have k or fewer digits.

This will only work if you know the range of numbers that we're dealing with so that will be the first step.

Look at each element in starting list to find lowest, l and highest, h number. In this case l is 1 and h is 11. Note, if you already know the range for some reason, you can skip this step.
Create a result list the size of our range and set each element to null.
Look at each element in list and add them to the result list at the appropriate place if needed. ie, the element is a 4, add a 4 to the result list at position 4. result[element] = starting_list[element]. You can throw out duplicates if you want, they'll just be overwritten.
Go through the result list to find the longest sequence without any null values. Keep a element_counter to know what element in the result list we're looking at. Keep a curr_start_element set to the beginning element of the current sequence and keep a curr_len of how long the current sequence is. Also keep a longest_start_element and a `longest_len' which will start out as zero and be updated as we move through the list.
Return the result list starting at longest_start_element and taking longest_len

EDIT: Code added. Tested and working

#note this doesn't work with negative numbers
#it's certainly possible to write this to work with negatives
# but the code is a bit hairier
import sys
def findLongestSequence(lst):
    #step 1
    high = -sys.maxint - 1

    for num in lst:
        if num > high:
            high = num

    #step 2
    result = [None]*(high+1)

    #step 3
    for num in lst:
        result[num] = num

    #step 4
    curr_start_element = 0
    curr_len = 0
    longest_start_element = -1
    longest_len = -1

    for element_counter in range(len(result)):
        if result[element_counter] == None:

            if curr_len > longest_len:
                longest_start_element = curr_start_element
                longest_len = curr_len

            curr_len = 0
            curr_start_element = -1

        elif curr_start_element == -1:
            curr_start_element = element_counter

        curr_len += 1

    #just in case the last element makes the longest
    if curr_len > longest_len:
        longest_start_element = curr_start_element
        longest_len = curr_len


    #step 5
    return result[longest_start_element:longest_start_element + longest_len-1]

Step 4 iterates over the result list n times, so this is not O(n). — jeffknupp, Dec 29 '11 at 07:54
@jknupp No, you just need to go through one time. It's the same as finding the max value from a list. Except it finds the longest sequence in the list. suppose list = `[1,2,3,null,5,6,7,8,null,10]` I see that `[1,2,3]` is length 3 so I save the start index. Then see that `[5,6,7,8]` is length 4 so update the longest index/length vars. `[8]` doesn't change it. One loop, found the longest. — jb., Dec 29 '11 at 08:12
The n in O(n) refers to the size of the input list. The range of values can be vastly larger and independent from the length of list. — Janne Karila, Dec 29 '11 at 08:50
@JanneKarila my mistake, you are quite right. from wikipedia `Radix sort's efficiency is O(k·n) for n keys which have k or fewer digits.` — jb., Dec 29 '11 at 08:59

srgerg · Answer 5 · 2011-12-29T08:28:43.283

If the result really does have to be a sub-sequence of consecutive ascending integers, rather than merely ascending integers, then there's no need to remember each entire consecutive sub-sequence until you determine which is the longest, you need only remember the starting and ending values of each sub-sequence. So you could do something like this:

def longestConsecutiveSequence(sequence):
    # map starting values to largest ending value so far
    map = collections.OrderedDict()

    for i in sequence:
        found = False
        for k, v in map.iteritems():
            if i == v:
                map[k] += 1
                found = True

        if not found and i not in map:
            map[i] = i + 1

    return xrange(*max(map.iteritems(), key=lambda i: i[1] - i[0]))

If I run this on the original sample date (i.e. [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]) I get:

>>> print list(longestConsecutiveSequence([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

If I run it on one of Abhijit's samples [1,2,3,11,12,13,14], I get:

>>> print list(longestConsecutiveSequence([1,2,3,11,12,13,14]))
[11, 12, 13, 14]

Regrettably, this algorithm is O(n*n) in the worst case.

score 0 · Answer 6 · answered Dec 29 '11 at 08:41

Warning: This is the cheaty way to do it (aka I use python...)

import operator as op
import itertools as it

def longestSequence(data):

    longest = []

    for k, g in it.groupby(enumerate(set(data)), lambda(i, y):i-y):
        thisGroup = map(op.itemgetter(1), g)

        if len(thisGroup) > len(longest):
            longest = thisGroup

    return longest


longestSequence([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11, 15,15,16,17,25])

score 0 · Answer 7 · answered Dec 29 '11 at 09:28

You need the Maximum contiguous sum(Optimal Substructure):

def msum2(a):
    bounds, s, t, j = (0,0), -float('infinity'), 0, 0

    for i in range(len(a)):
        t = t + a[i]
        if t > s: bounds, s = (j, i+1), t
        if t < 0: t, j = 0, i+1
    return (s, bounds)

This is an example of dynamic programming and is O(N)

score 0 · Answer 8 · answered Dec 29 '11 at 10:01

O(n) solution works even if the sequence does not start from the first element.

Warning does not work if len(A) = 0.

A = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]
def pre_process(A): 
    Last = {}
    Arrow = []
    Length = []
    ArgMax = 0
    Max = 0
    for i in xrange(len(A)): 
        Arrow.append(i)
        Length.append(0)  
        if A[i] - 1 in Last: 
            Aux = Last[A[i] - 1]
            Arrow[i] = Aux
            Length[i] = Length[Aux] + 1
        Last[A[i]] = i 
        if Length[i] > Max:
            ArgMax = i 
            Max = Length[i]
    return (Arrow,ArgMax)  

(Arr,Start) = pre_process(A) 
Old = Arr[Start] 
ToRev = []
while 1: 
    ToRev.append(A[Start]) 
    if Old == Start: 
        break
    Start = Old 
    New = Arr[Start]
    Old = New
ToRev.reverse()
print ToRev

Pythonizations are welcome!!

Spencer Rathbun · Answer 9 · 2011-12-29T19:33:03.667

Ok, here's yet another attempt in python:

def popper(l):
    listHolders = []
    pos = 0
    while l:
        appended = False
        item = l.pop()
        for holder in listHolders:
            if item == holder[-1][0]-1:
                appended = True
                holder.append((item, pos))
        if not appended:
            pos += 1
            listHolders.append([(item, pos)])
    longest = []
    for holder in listHolders:
        try:
            if (holder[0][0] < longest[-1][0]) and (holder[0][1] > longest[-1][1]):
                longest.extend(holder)
        except:
            pass
        if len(holder) > len(longest):
            longest = holder
    longest.reverse()
    return [x[0] for x in longest]

Sample inputs and outputs:

>>> demo = list(range(50))
>>> shuffle(demo)
>>> demo
[40, 19, 24, 5, 48, 36, 23, 43, 14, 35, 18, 21, 11, 7, 34, 16, 38, 25, 46, 27, 26, 29, 41, 8, 31, 1, 33, 2, 13, 6, 44, 22, 17,
 12, 39, 9, 49, 3, 42, 37, 30, 10, 47, 20, 4, 0, 28, 32, 45, 15]
>>> popper(demo)
[1, 2, 3, 4]
>>> demo = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]
>>> popper(demo)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>>

jeffknupp · Answer 10 · 2011-12-29T07:29:22.773

-2

This should do the trick (and is O(n)):

target = 1
result = []
for x in list:
    for y in result:
        if y[0] == target:
            y[0] += 1
            result.append(x)

For any starting number, this works:

result = []
for x in mylist:
    matched = False
    for y in result:
        if y[0] == x:
            matched = True
            y[0] += 1
            y.append(x)
    if not matched:
        result.append([x+1, x])
return max(result, key=len)[1:]

edited Dec 29 '11 at 07:29

answered Dec 29 '11 at 06:38

jeffknupp

5,966
3
28
29

5

This will find the *first*, not the largest, starting with 1. `[2, 3, 4, 5, 1, 2]` – Ignacio Vazquez-Abrams Dec 29 '11 at 06:41
Wow, that's clever, thanks. How about for `[1, 2, 3, 11, 12, 13, 14]` though? Would this algorithm just return `[1, 2, 3]`? – dangerChihuahua007 Dec 29 '11 at 06:43
@ChiZeng, what about `[11, 12, 13, 14]`? Wouldn't that be the largest consecutive set of numbers? – Makoto Dec 29 '11 at 06:48
Indeed, that's what the algorithm is supposed to return, but I think this implementation just returns `[1, 2, 3]` because `target` starts at 1. – dangerChihuahua007 Dec 29 '11 at 06:58
Oh wait, I think the changes fix it. – dangerChihuahua007 Dec 29 '11 at 06:59
1

why dont you or upvoters check the code?. How could you subscribe `y` the first time ? (`TypeError: 'int' object is unsubscriptable`) – joaquin Dec 29 '11 at 07:02
This has been updated, I was initially trying to write psudeo-code, but then wrote the real code. Tested on both inputs and returns the correct answers. My apologies. – jeffknupp Dec 29 '11 at 07:03
@jknupp, You are almost using the Patience Sort Implementation, except that you can do a binary search instead of linear – Abhijit Dec 29 '11 at 07:05
I just tested both version on the input list (after refreshing the page to make sure I have the latest version) and both raise errors for me. So I have to downvote this. – David Z Dec 29 '11 at 07:08
@David Zalavsky What error are you getting? This works fine for me. – jeffknupp Dec 29 '11 at 07:09
1

The first code sample returns an empty list, and the second raises `TypeError: 'int' object is not subscriptable` on the line `if y[0] == x`. – David Z Dec 29 '11 at 07:16
correct `result.append(x)` with `y.append(x)` in the second algorithm ? sorry no still doesnt work – joaquin Dec 29 '11 at 07:17
1

Also `False` should be capitalized, but I fixed that before I ran it. – David Z Dec 29 '11 at 07:19
@joaquin: okay, that helps, no more errors now in code sample 2. However I'm still not satisfied the algorithm gives the correct results (then again I'm not quite sure what the correct results are supposed to be, other than in the one example given). – David Z Dec 29 '11 at 07:21
with the new editions, correcting False and changing result.append(x) with y.append(x) in the second algorithm it works finally (as far as I can tested). If the poster says it works then how difficult could be to copy-paste her session here ? – joaquin Dec 29 '11 at 07:23
@joaquin Sorry, too many edits. Got half psuedo-code and half of the correct implementation. Should be totally fixed now. – jeffknupp Dec 29 '11 at 07:24
@DavidZaslavsky Would you please read through my answer and see if it'll work? – jb. Dec 29 '11 at 07:40
@jknupp why are you trying to subscript an int? `y[0]` – jb. Dec 29 '11 at 07:47
@jb. As I commented on the question, I'm not even entirely sure what the correct behavior is, so pending clarification from the OP, I couldn't tell you whether your algorithm is correct. Besides, I don't think I have the time to parse a written description of an algorithm. Providing sample code would make it easier to test. – David Z Dec 29 '11 at 08:14
@DavidZaslavsky Fair enough, I've added python code that finds the solution to the problem as I see it. – jb. Dec 29 '11 at 09:14

How does one find the largest consecutive set of numbers in a list that are not necessarily adjacent?

10 Answers10

Linked