Find the biggest interval that has all its members in list in O(n)

Question

I was asked this in an interview. Given a list of integers, How can we find the biggest interval that has all its members in the given list?

E.g. given list 1,3,5,7,4,6,10 then answer would be [3, 7]. Because it has all the elements between 3 and 7.

I tried to answer but I wasn't convincing. The approach I took was to first sort the list and then check it for the biggest interval. But I was asked to do so in O(n).

@ZiyaoWei to be fair, by having the `c#` tag I (and I'm sure others) would've immediately shied away from that question; if the language doesn't matter, don't make it language-specific. — kojiro, Jul 11 '13 at 16:21

Grigor Gevorgyan · Answer 1 · 2018-07-04T08:09:04.497

35

I know a solution based on hashing and dynamic programming. Let f(x) be the hash function. The trick is the hash-table value. Consider the longest interval contained in the list, which either starts or ends with x. Then h[f(x)] = y, where y is the other end of that interval. Note that length of that interval will be abs( x - y ) + 1. The algorithm description will make clear why to store that value.

Move over the list. Let i be current index, x := list[ i ] - current number. Now

1. if h[f(x)] is not empty, then we've met number x before. Nothing to do, continue.

2. Check h[f(x-1)] and h[f(x+1)].

2.1. If they're both not empty, that means we've already met x-1 and x+1, and we know some intervals [a..x-1] and [x+1..b] which we've already met in the list. We know it because a=h[f(x-1)] and b=h[f(x+1)] by definition of h. Now when we got x, it means that we now have met the whole interval [a,b], so we update values as follows: h[f(a)] := b and h[f(b)] := a.
Also set h[f(x)] to some value (let's say x, not to impact the answer), just so that next time we meet x in the list, we ignore it. x has already done his job.

2.2. If only one of them is set, let's say h[f(x-1)] = a, that means we've already met some interval [a..x-1], and now it's extended with x. Update will be h[f(a)] := x and h[f(x)] := a.

2.3. If none of them is set, that means we've met neither x-1, nor x+1, and the biggest interval containing x we've already met is the single [x] itself. So set h[f(x)] := x.

Finally, to get the answer, pass over the whole list and take maximum abs( x - h[f(x)] ) + 1 for all x.

edited Jul 04 '18 at 08:09

answered Jul 11 '13 at 11:46

Grigor Gevorgyan

6,753
4
35
64

+1, looks like working solution, have to check it later when i have more time – Roman Pekar Jul 11 '13 at 12:09
@Grigor What is current number in this equation? x = list[ i ] - current number – Aravind Jul 11 '13 at 13:39
@Aravind: The value of list[ i ]. Maybe I should write x := list[ i ], I'll correct – Grigor Gevorgyan Jul 11 '13 at 13:44
@GrigorGevorgyan: I'm sorry I still don't understand.For instance, for the list {1,3,5,4}, I iterate through the list, and first i=0, list[i]=1, Now what is x? – Aravind Jul 11 '13 at 13:47
@Aravind: x = list[0] = 1. I just denote list[i] as x at each step. It's just comfortable. – Grigor Gevorgyan Jul 11 '13 at 13:55
@GrigorGevorgyan: I trying to implement it, and the problem is that in step 2.1, you say that [x+1..b] why can't the longest interval with x+1 as one end point have its other end point at p where p<(x+1) ? – Aravind Jul 11 '13 at 14:12
@Aravind: If so, then we would have met x, because p<=x – Grigor Gevorgyan Jul 11 '13 at 14:14
"Note that length of that interval will be abs( x - y + 1 )." Do you mean abs(x - y) + 1 ? – LarsH Jul 11 '13 at 14:20
1

@GrigorGevorgyan: Here's the working code: http://ideone.com/ABoRgz – Aravind Jul 11 '13 at 14:34
@GrigorGevorgyan: Let me point a possible problem with this approach.The OP has said that range of numbers will not be known.I know we could fins the range with one traversal through list.But how is it different in from doing counting sort since both are time complexity:O(N), space Complexity:O(M) where M=range of numbers in input?In fact your time complexity:O(N+M).So can you point out how this is better than counting sort.Your method is totally cool BTW.+1-ed it.Just curious. – Aravind Jul 11 '13 at 14:45
@Aravind: That's why you should use hash, which I happen not to see in your code. The size of your hashtable hence will be O(N), where N is the size of the list. – Grigor Gevorgyan Jul 11 '13 at 14:56
@GrigorGevorgyan: Cool! Yeah!So we need a list to resolve collisions.It's O(N) size.Thanks! – Aravind Jul 11 '13 at 14:58
@GrigorGevorgyan Seems that the input 2,3,5,6,7,9,10 returns (2,3) when it should return (5,7). – r-s Sep 20 '13 at 21:48
@GrigorGevorgyan just awesome . I wonder why posts like yours get so less upvotes ! – Aseem Goyal Dec 06 '13 at 08:08

Evgeny Kluev · Answer 2 · 2013-07-11T12:10:46.993

9

If sorting is not desirable, you can use a combination of hash map and Disjoint-set data structure.

For each element in the list create a node and insert it into hash map with key = element's value. Then query the hash map for value+1 and value-1. If anything is found, combine current node with set(s) where adjacent nodes belong. When finished with the list, the largest set corresponds to the biggest interval.

Time complexity is O(N * α(N)) where α(N) is inverse Ackermann function.

Edit: Actually Disjoint-set is too powerful for this simple task. Solution by Grigor Gevorgyan does not use it. So it is simpler and more efficient.

edited Jul 11 '13 at 12:10

answered Jul 11 '13 at 11:22

Evgeny Kluev

24,287
7
55
98

Good approach and really close to linear for any reasonable number – Ivaylo Strandjev Jul 11 '13 at 11:25
@Jayram: I didn't underatand your last comment. I mean when you process, for example, value 5 from the list, you search 4 and 6, and combine current node with the sets where 4 and/or 6 belong. – Evgeny Kluev Jul 11 '13 at 11:27
@EvgenyKluev btw you will have problems with reapeating numbers. I guess you will need to keep track of the left and right end of eash tree in the disjoint set forest – Ivaylo Strandjev Jul 11 '13 at 11:29
1

@IvayloStrandjev: that's right; alternatively we could just get rid of duplicates using the same hash map. – Evgeny Kluev Jul 11 '13 at 11:31
+1 This seems spiritually akin to the counting sort solution, but by using a hashmap instead of a bitmap you can sidestep the problem of a really large or sparse list. – kojiro Jul 11 '13 at 11:31
Can we really accept that hash map query is `O(1)`? There'are no guarantees that input is random/uniform – default locale Jul 11 '13 at 11:33
@defaultlocale: hash map insert/query is O(1) expected time and this does not require input to be random/uniform. And yes, worst-case time is O(N). – Evgeny Kluev Jul 11 '13 at 11:35
@kojiro advantage here is that this works where counting sort fails is that we don't really need the sorted sequence only the beginning and the end of each interval. – Ivaylo Strandjev Jul 11 '13 at 11:39
2

I've tried to say that input can be generated to attack hash function (or interviewer can treat hash as a subject for collisions). Anyway +1 for practically acceptable solution. – default locale Jul 11 '13 at 11:39
@defaultlocale: I think, in many cases this attack could be prevented by choosing hash function randomly from a large set of hash functions. – Evgeny Kluev Jul 11 '13 at 11:45
@IvayloStrandjev actually I wasn't talking about a literal Counting Sort, I was referring to RomanPekar's original answer which was to create a bit array and mask out the existing members. – kojiro Jul 11 '13 at 12:00

Dave · Answer 3 · 2013-07-11T14:09:02.010

5

You can trade off space to get this in linear time.

Scan the list for the smallest and largest values, S and L.
Use an array of booleans or a bitvector, A, large enough to hold (L - S + 1) entries.
Go through the list again, setting the appropriate element of A to true when you see it.
Now, A is sorted. Go through A and find the largest consecutive set of true values.

The first steps are linear in your list. The last is linear in the size of A, which could be large relative to your list if you have just a few values which are far apart. But, since you're dealing with ints, A is bounded.

edited Jul 11 '13 at 14:09

answered Jul 11 '13 at 12:17

Dave

7,460
3
26
39

No way is this log time. Linear, yes. But not log. Two of your steps are not even bounded linear - there could be an indefinite number of values since we weren't told there are no duplicates. A is bounded, but only by MAX_INT, which is kind of a large range to scan through. – Darrel Hoffman Jul 11 '13 at 13:15
D'oh! Yes, obviously linear, and can't be better. Wrote that before my first coffee. – Dave Jul 11 '13 at 14:09
Your step 4 is `O(L - S)`, and that's unbounded -- the question isn't talking about `int32` or something like that. It just says "integers". – balpha Jul 11 '13 at 14:25
[This answer](http://stackoverflow.com/a/17592633/1711796) provides a work-around to avoid O(L-S) on step 4, but I believe to simply create the array still has that complexity. – Bernhard Barker Jul 11 '13 at 14:44
Anyway, Grigor's answer is superior. – Dave Jul 11 '13 at 16:13
This uses O(L-S) space, which is completely unacceptable - L could be MAXINT and S be MININT... – einpoklum Jul 16 '13 at 08:06

score 3 · Answer 4 · edited May 23 '17 at 12:09

1 idea: well, I think you have to kinda sort the list anyway, but you can't go with merge or quick sort. But if you have memory, you could use idea from counting sort for integers.

So you can create array of 0 and 1, from 0 to max int value, then fill it with ones if you have value and then find maximum continous array

2 idea: create dictionary of values, find min and max - all O(N) operations:

dict = {1: 1, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 10: 10}
min = 1
max = 10

then, go like i in range(min, max) and and find longest continuous subset

>>> d = [1, 3, 5, 7, 4, 6, 10]
>>> s = set(d)
>>> mind = min(d)
>>> maxd = max(d)
>>> a, b, j = 0, 0, 0

>>> for i in range(mind, maxd):
        if i not in s:
            if (b - a) < (i - j - 1):
                a, b = j, i - 1
            j = i + 1

>>> a, b
(3, 7)

but this could be slow for sparse lists like [1, 9000, 100000]

EDIT: based on super great answer of Grigor Gevorgyan, here's the code for O(N) dictionary solution in Python (I just love it's simplicity!!!)

l = [1, 3, 5, 7, 4, 6, 10]
d = {x:None for x in l}
print d
for (k, v) in d.iteritems():
    if v is not None: continue
    a, b = d.get(k - 1), d.get(k + 1)
    if a is not None and b is not None: d[k], d[a], d[b] = k, b, a
    elif a is not None: d[a], d[k] = k, a
    elif b is not None: d[b], d[k] = k, b
    else: d[k] = k
    print d

m = max(d, key=lambda x: d[x] - x)
print m, d[m]

output:

{1: None, 3: None, 4: None, 5: None, 6: None, 7: None, 10: None}
{1: 1, 3: None, 4: None, 5: None, 6: None, 7: None, 10: None}
{1: 1, 3: 3, 4: None, 5: None, 6: None, 7: None, 10: None}
{1: 1, 3: 4, 4: 3, 5: None, 6: None, 7: None, 10: None}
{1: 1, 3: 5, 4: 3, 5: 3, 6: None, 7: None, 10: None}
{1: 1, 3: 6, 4: 3, 5: 3, 6: 3, 7: None, 10: None}
{1: 1, 3: 7, 4: 3, 5: 3, 6: 3, 7: 3, 10: None}
{1: 1, 3: 7, 4: 3, 5: 3, 6: 3, 7: 3, 10: 10}
3 7

I think the interviewer does not want sorting at all.So can you think of a way without counting sort? — Aravind, Jul 11 '13 at 11:10
I do not suggest to sort it, I suugest to use an idea from counting sort — Roman Pekar, Jul 11 '13 at 11:11

score 2 · Answer 5 · answered Jul 11 '13 at 16:37

I crafted a very straightforward solution using a HashSet. Since contains and remove are O(1) operations, you can simply create a new interval from a random set item and 'expand' the interval it until you discover its full size, removing items from the set as you go along. The removal is key, because this is what prevents you from 'repeating' any intervals.

It might help to think about it this way - the list has K intervals, whose sizes add up to N. Your task, then, is to discover what these intervals are, without repeating any intervals or items. This is why the HashSet is perfect for the job - you can efficiently remove items from the set as you expand your intervals. Then all you need to do is keep track of the largest interval as you go along.

Put the list into a HashSet
While the set is non-empty:
1. remove an item at random from the set
2. Define a new interval from that item
3. Expand the interval as follows:
  1. Define i = interval.start-1
  2. While the set contains i, remove i from the set and decrement both i and interval.start
  3. Repeat step 2 in the other direction (expand up from interval.end)
4. If the expanded interval is larger than the previously largest interval, record the new interval as the largest interval
Return the largest interval

Here is the solution in Java:

public class BiggestInterval {

    static class Interval {
        int start;
        int end;

        public Interval(int base) {
            this(base,base);
        }

        public Interval(int start, int end) {
            this.start = start;
            this.end = end;
        }

        public int size() {
            return 1 + end - start;
        }

        @Override
        public String toString() {
            return "[" + start + "," + end + "]";
        }
    }

    /**
     * @param args
     */
    public static void main(String[] args) {
        System.out.println(biggestInterval(Arrays.asList(1,3,5,7,4,6,10)));
    }

    public static Interval biggestInterval(List<Integer> list) {
        HashSet<Integer> set = new HashSet<Integer>(list);
        Interval largest = null;

        while(set.size() > 0) {
            Integer item = set.iterator().next();
            set.remove(item);

            Interval interval = new Interval(item);
            while(set.remove(interval.start-1)) {
                interval.start--;
            }
            while(set.remove(interval.end+1)) {
                interval.end++;
            }

            if (largest == null || interval.size() > largest.size()) {
                largest = interval;
            }
        }

        return largest;
    }
}

score 1 · Answer 6 · answered Jul 11 '13 at 12:22

That would be linear considering dictionaries built with average O(1) hash tables.

L = [1,3,5,7,4,6,10]

a_to_b = {}
b_to_a = {}

for i in L:
    if i+1 in a_to_b and i-1 in b_to_a:
        new_a = b_to_a[i-1]
        new_b = a_to_b[i+1]
        a_to_b[new_a] = new_b
        b_to_a[new_b] = new_a
        continue
    if i+1 in a_to_b:
        a_to_b[i] = a_to_b[i+1]
        b_to_a[a_to_b[i]] = i
    if i-1 in b_to_a:
        b_to_a[i] = b_to_a[i-1]
        a_to_b[b_to_a[i]] = i
    if not (i+1 in a_to_b or i-1 in b_to_a):
        a_to_b[i] = i
        b_to_a[i] = i

max_a_b = max_a = max_b = 0
for a,b in a_to_b.iteritems():
    if b-a > max_a_b:
        max_a = a
        max_b = b
        max_a_b = b-a

print max_a, max_b

Briguy37 · Answer 7 · 2013-07-11T15:31:09.307

Here's a solution similar to Grigor's. Two main differences are that this solution stores the length of the sequential set instead of other indexes and that this eliminates the need for the last hash-set iteration.

Iterate over the array
- Build a hashmap by looking for and updating adjacent set endpoints:
  
  Key - The array values
  
  Value - When the key is an endpoint of a sequential set, store the length of that set. Otherwise, keep it truthy so you only consider things once.
- If the current set size is longest, update the longest set size and longest set start.

Here's a JavaScript implementation for clarity, as well as a fiddle to see it in action:

var array = [1,3,5,7,4,6,10];

//Make a hash of the numbers - O(n) assuming O(1) insertion
var longestSetStart;
var longestSetSize = 0;

var objArray = {};
for(var i = 0; i < array.length; i++){
    var num = array[i];

    if(!objArray[num]){//Only consider numbers once
        objArray[num] = 1;//Initialize to 1 item in the set by default

        //Get the updated start and end of the current set
        var currentSetStart = num;//Starting index of the current set
        var currentSetEnd = num;//Ending index of the current set

        //Get the updated start of the set
        var leftSetSize = objArray[num - 1];
        if(leftSetSize){
            currentSetStart = num - leftSetSize;
        }

        //Get the updated end of the set
        var rightSetSize = objArray[num + 1];
        if(rightSetSize){
            currentSetEnd = num + rightSetSize;
        }

        //Update the endpoints
        var currentSetSize = currentSetEnd - currentSetStart + 1;
        objArray[currentSetStart] = currentSetSize;
        objArray[currentSetEnd] = currentSetSize;

        //Update if longest set
        if(currentSetSize > longestSetSize){
            longestSetSize = currentSetSize;
            longestSetStart = currentSetStart;
        }
    }
}

var longestSetEnd = longestSetStart + longestSetSize - 1;

Craig Gidney · Answer 8 · 2013-07-12T10:24:27.017

The trick is to think of the items as a set instead of a list. This allows you to identify items that are at the start or end of contiguous ranges, because a set lets you check if item-1 or item+1 is present. With that, you can solve the problem in linear time and space.

Pseudo-Code:

Enumerate the items in the set, looking for ones that are at the start of a range (x starts a range when x-1 is not in the set).
For each value that is the start of a range, scan upwards until you find the corresponding end of range value (x ends a range when x+1 is not in the set). This gives you all the relevant contiguous ranges.
Return the contiguous range whose end was furthest from its start.

C# Code:

static Tuple<int, int> FindLargestContiguousRange(this IEnumerable<int> items) {
    var itemSet = new HashSet<int>(items);

    // find contiguous ranges by identifying their starts and scanning for ends
    var ranges = from item in itemSet

                 // is the item at the start of a contiguous range?
                 where !itemSet.Contains(item-1)

                 // find the end by scanning upward as long as we stay in the set
                 let end = Enumerable.Range(item, itemSet.Count)
                           .TakeWhile(itemSet.Contains)
                           .Last()

                 // represent the contiguous range as a tuple
                 select Tuple.Create(item, end);

     // return the widest contiguous range that was found
     return ranges.MaxBy(e => e.Item2 - e.Item1);
}

note: MaxBy is from MoreLinq

Testing

Small sanity check:

new[] {3,6,4,1,8,5}.FindLargestContiguousRange().Dump();
// prints (3, 6)

Big contiguous list:

var zeroToTenMillion = Enumerable.Range(0, (int)Math.Pow(10, 7)+1);
zeroToTenMillion.FindLargestContiguousRange().Dump();
// prints (0, 10000000) after ~1 seconds

Big fragmented list:

var tenMillionEvens = Enumerable.Range(0, (int)Math.Pow(10, 7)).Select(e => e*2);
var evensWithAFewOdds = tenMillionEvens.Concat(new[] {501, 503, 505});
evensWithAFewOdds.FindLargestContiguousRange().Dump();
// prints (500, 506) after ~3 seconds

Complexity

This algorithm requires O(N) time and and O(N) space, where N is the number of items in the list, assuming the set operations are constant time.

Note that if the set was given as an input, instead of being built by the algorithm, we would only need O(1) space.

(Some comments say this is quadratic time. I think they assumed all items, instead of just items at the starts of ranges, triggered scans. That would indeed be quadratic, if the algorithm worked that way.)

What is the complexity of that approach? Kind of looks like it could be O(n^2). It's kind of gibberish to anyone who doesn't know LINQ. — Bernhard Barker, Jul 11 '13 at 14:24
"Enumerate the items in the set, and scanning each range for how far that range goes" - that looks like O(n^2) to me. Also LINQ is designed to hide complexity and the algorithms that are in use - so it is a bad fit to express a solution that requires thinking about complexity. — Jean Hominal, Jul 11 '13 at 14:26
@Dukeling It's linear time. Scanning the range only happens from the start of a range, not the middle. — Craig Gidney, Jul 11 '13 at 14:34
@JeanHominal I clarified that only items found to be at the start of a range, because e-1 is not in the set, trigger a scan. You're wrong about LINQ being difficult to analyze. It's easier, because the structure is clearer instead of hidden in the branches and breaks. Just do the same analysis you'd do of any functional-style algorithm. — Craig Gidney, Jul 11 '13 at 14:38
`HashSet.Add` is only guaranteed to be O(n). Meaning that there is nothing that guarantees that the act of building the `HashSet` will be O(n) - it may very well be something like O(n ln(n)). Unless you can prove it is the case by digging in the implementation of `HashSet`, your solution does not work. In short: `HashSet` is not magic. — Jean Hominal, Jul 12 '13 at 07:29
The pseudo-code may be O(n) (assuming the data and hash function allows for hash-set operations to be O(1)), but I'm far from convinced that the C# code is. Having it run on a set with a million or so elements in around a second or less might be enough to convince me. — Bernhard Barker, Jul 12 '13 at 08:50
@JeanHominal It is extremely common to assume hash sets and hash tables have constant time operations. If you want to be pedantic, it's *expected* linear time. — Craig Gidney, Jul 12 '13 at 09:52
@Dukeling I tested it with ten million items. It takes about three seconds. — Craig Gidney, Jul 12 '13 at 10:05

score -1 · Answer 9 · answered Jul 11 '13 at 11:03

I think I would have sorted them into lists of consecutive integers (assuming each number can appear only once)

take first number

if the number 1 lower than or 1 higher than a number in an existing list?

yes: pre/post pend existing list

no : create a new list starting with the current number

if there are more numbers, return to top

display the longest list

Boris · Answer 10 · 2013-07-11T12:50:46.767

Disclaimer: Since the solution is based on hashtables, the running times are expected, not worst case.

This O(n) solution depends on the integers being unique. If they are not unique, make a hashset with O(1) insertion and membership lookup, and just skip the numbers already encountered, as you go through the list.

Make a O(1) lookup/insert hashmap where the values are the beginnings of ranges, and the keys are the numbers that fit at the end of those ranges. For a value v and a key k, this means that the range starting from v and ending with k-1 inclusive is located at key k.
Go through the list of numbers. For each number n check if the map has a value v at key n. This corresponds to there being a range starting from v that would allow n at the end. If there is, move v to key n+1 and delete the entry at key n. If there isn't any range, insert n at key n+1.
Since the numbers are unique, none of the ranges overlap in the end, but there may be some contiguous ones. Run through the key/value pairs of the map. For each key k and value v, if the map has a value v1 at key k1 = v, then it means that there is a range from v1 to k-1. Insert v1 at k, and delete the entry k1/v1.
Go through the k/v entries of the map to find the largest range [v,k-1] of size k-v, using a running maximum.

For your example:

setup:
l = [1,3,5,7,4,6,10]
m = {}

iteration:
process 1  : m = {2->1}
process 3  : m = {2->1, 4->3}
process 5  : m = {2->1, 4->3, 6->5}
process 7  : m = {2->1, 4->3, 6->5, 8->7}
process 4  : m = {2->1, 5->3, 6->5, 8->7}
process 6  : m = {2->1, 5->3, 7->5, 8->7}
process 10 : m = {2->1, 5->3, 7->5, 8->7, 11->10}

concatenation of contiguous ranges:
initial:              m = {2->1, 5->3, 7->5, 8->7, 11->10}
first concatenation:  m = {2->1,       7->3, 8->7, 11->10}, k=7, v=5, k1=5, v1=3
second concatenation: m = {2->1,             8->3, 11->10}, k=8, v=7, k1=7, v1=3

result:
largest range : [3,7] of size 5

Step 2 is linear in the number of ranges, which is O(n), so this is O(n^2). — Dave, Jul 11 '13 at 12:24
@DaveGalvin: No, step 2 doesn't go through the ranges sequentially. For each number it checks if the map has an entry at that number. With a hashtable map implementation that is an expected O(1) operation. — Boris, Jul 11 '13 at 12:31

Find the biggest interval that has all its members in list in O(n)

10 Answers10

Linked