17

Given an unsorted array of positive integers, find the length of the longest subarray whose elements when sorted are continuous. Can you think of an O(n) solution?

Example:

{10, 5, 3, 1, 4, 2, 8, 7}, answer is 5.

{4, 5, 1, 5, 7, 6, 8, 4, 1}, answer is 5.

For the first example, the subarray {5, 3, 1, 4, 2} when sorted can form a continuous sequence 1,2,3,4,5, which are the longest.

For the second example, the subarray {5, 7, 6, 8, 4} is the result subarray.

I can think of a method which for each subarray, check if (maximum - minimum + 1) equals the length of that subarray, if true, then it is a continuous subarray. Take the longest of all. But it is O(n^2) and can not deal with duplicates.

Can someone gives a better method?

shilk
  • 589
  • 5
  • 17

7 Answers7

2

Algorithm to solve original problem in O(n) without duplicates. Maybe, it helps someone to develop O(n) solution that deals with duplicates.

Input: [a1, a2, a3, ...]

Map original array as pair where 1st element is a value, and 2nd is index of array.

Array: [[a1, i1], [a2, i2], [a3, i3], ...]

Sort this array of pairs with some O(n) algorithm (e.g Counting Sort) for integer sorting by value. We get some another array:

Array: [[a3, i3], [a2, i2], [a1, i1], ...]

where a3, a2, a1, ... are in sorted order.

Run loop through sorted array of pairs

In linear time we can detect consecutive groups of numbers a3, a2, a1. Consecutive group definition is next value = prev value + 1. During that scan keep current group size (n), minimum value of index (min), and current sum of indices (actualSum).

On each step inside consecutive group we can estimate sum of indices, because they create arithmetic progression with first element min, step 1, and size of group seen so far n. This sum estimate can be done in O(1) time using formula for arithmetic progression:

estimate sum = (a1 + an) * n / 2;

estimate sum = (min + min + (n - 1)) * n / 2;

estimate sum = min * n + n * (n - 1) / 2;

If on some loop step inside consecutive group estimate sum equals to actual sum, then seen so far consecutive group satisfy the conditions. Save n as current maximum result, or choose maximum between current maximum and n.

If on value elements we stop seeing consecutive group, then reset all values and do the same.

Code example: https://gist.github.com/mishadoff/5371821

mishadoff
  • 10,719
  • 2
  • 33
  • 55
  • 1
    you don't need the sum, you could use [`max - min + 1 == n` instead](http://ideone.com/kk6yT2). Your code uses `O(n*log(n))` sort. It is not clear how to sort integers in `O(n)` if there is no upper limit for integers e.g., if there are `sqrt(n)` of `n**sqrt(n)` integers then the input size is still `O(n)` but Counting Sort is `O(n + maxdiff) = O(n + n**sqrt(n)) = O(n**sqrt(n))` or Radix Sort is `O(n*ndigits) = O(n * sqrt(n))`. I use computational model here that assumes that `n` can be stored in `O(1)` machine words. – jfs Apr 18 '13 at 10:43
  • Given that we are interested only in "continuous sequences", It might be possible to split the input into bins in `O(n)` such that `min(bin[j]) - max(bin[i]) > n` and `max(bin[i]) - min(bin[i])` is `O(n)`. And search for "continuous sequences" inside individual bins. It might lead to `O(n)` algorithm for input without duplicates. – jfs Apr 18 '13 at 10:44
  • 2
    This will not work for 100,80,17,12,10,15,14,16,19,30,13,70 . The algorithm will sort the array and start with 12 and goes on till 19 without starting from 14 which gives the answer. Correct answer for this example is 14,15,16 – siddardha Nov 16 '14 at 08:10
  • @mishadoff This won't work for arrays like this `[4,100,3,2,1000,1]`. Your algo will check for [1->5,2->3,3->2,4->0] and see that this isn't possible and give `1` as final answer. But `[2->3,3->2]` is valid and the answer should be `2` – duplex143 May 16 '21 at 14:39
1

UPD2: The following solution is for a problem when it is not required that subarray is contiguous. I misunderstood the problem statement. Not deleting this, as somebody may have an idea based on mine that will work for the actual problem.


Here's what I've come up with:

Create an instance of a dictionary (which is implemented as hash table, giving O(1) in normal situations). Keys are integers, values are hash sets of integers (also O(1)) – var D = new Dictionary<int, HashSet<int>>.

Iterate through the array A and for each integer n with index i do:

  1. Check whether keys n-1 and n+1 are contained in D.
    • if neither key exists, do D.Add(n, new HashSet<int>)
    • if only one of the keys exists, e.g. n-1, do D.Add(n, D[n-1])
    • if both keys exist, do D[n-1].UnionWith(D[n+1]); D[n+1] = D[n] = D[n-1];
  2. D[n].Add(n)

Now go through each key in D and find the hash set with the greatest length (finding length is O(1)). The greatest length will be the answer.

To my understanding, the worst case complexity will be O(n*log(n)), only because of the UnionWith operation. I don't know how to calculate the average complexity, but it should be close to O(n). Please correct me if I am wrong.

UPD: To speak code, here's a test implementation in C# that gives the correct result in both of the OP's examples:

var A = new int[] {4, 5, 1, 5, 7, 6, 8, 4, 1};
var D = new Dictionary<int, HashSet<int>>();

foreach(int n in A)
{
    if(D.ContainsKey(n-1) && D.ContainsKey(n+1))
    {
        D[n-1].UnionWith(D[n+1]);
        D[n+1] = D[n] = D[n-1];
    }
    else if(D.ContainsKey(n-1))
    {
        D[n] = D[n-1];
    }
    else if(D.ContainsKey(n+1))
    {
        D[n] = D[n+1];
    }
    else if(!D.ContainsKey(n))
    {
        D.Add(n, new HashSet<int>());
    }

    D[n].Add(n);
}

int result = int.MinValue;
foreach(HashSet<int> H in D.Values)
{
    if(H.Count > result)
    {
        result = H.Count;
    }
}

Console.WriteLine(result);
Dmytro Shevchenko
  • 33,431
  • 6
  • 51
  • 67
  • 2
    The solution of the array `[10, 5, 3, 1, 4, 2, 8, 7, 0]` should be `5`, the maximum subarray is `[5, 3, 1, 4, 2]`. The additional `0` in the end does not change this! However, your code returns the result `6`, as it assumes the element `0` to be part of the (non-contiguous!) maximal subarray. – blubb Apr 12 '13 at 12:36
  • @blubb OP never stated the subarray must be contiguous. Only the elements when sorted must be continuous. But now I see the problem statement is ambiguous. The OP may have implied contiguosity. – Dmytro Shevchenko Apr 12 '13 at 12:41
  • It is implied in the question: the solution to `[4, 5, 1, 5, 7, 6, 8, 4, 1]` would be `7` instead of `5` if non-contiguity were allowed. Additionally, the number of possible subsets would be `2^n` instead of `n^2`. – blubb Apr 12 '13 at 12:43
  • Even if the subarray must be contiguous in the original array, my solution can be slightly modified to respect that. Complexity will not change. The real question is: how complex is my solution? – Dmytro Shevchenko Apr 12 '13 at 12:45
  • @blubb I assumed that duplicates should not be counted. That might have been a wrong assumption. – Dmytro Shevchenko Apr 12 '13 at 12:46
  • The analysis for the stated code seems correct. I'd be interested to see the modified code though! – blubb Apr 12 '13 at 12:48
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/28108/discussion-between-blubb-and-shedal) – blubb Apr 12 '13 at 12:57
  • your algorithm doesn't solve the problem but its [time complexity is linear in practice](http://i.stack.imgur.com/h1plq.png). I've measured it using [the implementation in Python](http://ideone.com/BQjXNF). – jfs Apr 27 '13 at 17:50
1

See the array S in it's mathematical set definition :

S = Uj=0k (Ij)

Where the Ij are disjoint integer segments. You can design a specific interval tree (based on a Red-Black tree or a self-balancing tree that you like :) ) to store the array in this mathematical definitions. The node and tree structures should look like these :

struct node {
    int d, u;
    int count;
    struct node *n_left, *n_right;
}

Here, d is the lesser bound of the integer segment and u, the upper bound. count is added to take care of possible duplicates in the array : when trying to insert an already existing element in the tree, instead of doing nothing, we will increment the count value of the node in which it is found.

struct root {
    struct node *root;
}        

The tree will only store disjoint nodes, thus, the insertion is a bit more complex than a classical Red-Black tree insertion. When inserting intervals, you must scans for potential overflows with already existing intervals. In your case, since you will only insert singletons this should not add too much overhead.

Given three nodes P, L and R, L being the left child of P and R the right child of P. Then, you must enforce L.u < P.d and P.u < R.d (and for each node, d <= u, of course).

When inserting an integer segment [x,y], you must find "overlapping" segments, that is to say, intervals [u,d] that satisfies one of the following inequalities :

y >= d - 1
OR
x <= u + 1

If the inserted interval is a singleton x, then you can only find up to 2 overlapping interval nodes N1 and N2 such that N1.d == x + 1 and N2.u == x - 1. Then you have to merge the two intervals and update count, which leaves you with N3 such that N3.d = N2.d, N3.u = N1.u and N3.count = N1.count + N2.count + 1. Since the delta between N1.d and N2.u is the minimal delta for two segments to be disjoint, then you must have one of the following :

  • N1 is the right child of N2
  • N2 is the left child of N1

So the insertion will still be in O(log(n)) in the worst case.

From here, I can't figure out how to handle the order in the initial sequence but here is a result that might be interesting : if the input array defines a perfect integer segment, then the tree only has one node.

Rerito
  • 5,886
  • 21
  • 47
1

This will require two passes over the data. First create a hash map, mapping ints to bools. I updated my algorithm to not use map, from the STL, which I'm positive uses sorting internally. This algorithm uses hashing, and can be easily updated for any maximum or minimum combination, even potentially all possible values an integer can obtain.

#include <iostream>

using namespace std;
const int MINIMUM = 0;
const int MAXIMUM = 100;
const unsigned int ARRAY_SIZE = MAXIMUM - MINIMUM;

int main() {

bool* hashOfIntegers = new bool[ARRAY_SIZE];
//const int someArrayOfIntegers[] = {10, 9, 8, 6, 5, 3, 1, 4, 2, 8, 7};
//const int someArrayOfIntegers[] = {10, 6, 5, 3, 1, 4, 2, 8, 7};
const int someArrayOfIntegers[] = {-2, -3, 8, 6, 12, 14,  4, 0, 16, 18, 20};
const int SIZE_OF_ARRAY = 11;

//Initialize hashOfIntegers values to false, probably unnecessary but good practice.
for(unsigned int i = 0; i < ARRAY_SIZE; i++) {
    hashOfIntegers[i] = false;
}

//Chage appropriate values to true.
for(int i = 0; i < SIZE_OF_ARRAY; i++) {
    //We subtract the MINIMUM value to normalize the MINIMUM value to a zero index for negative numbers.
    hashOfIntegers[someArrayOfIntegers[i] - MINIMUM] = true;
}

int sequence = 0;
int maxSequence = 0;
//Find the maximum sequence in the values
for(unsigned int i = 0; i < ARRAY_SIZE; i++) {

    if(hashOfIntegers[i]) sequence++;
    else sequence = 0;

    if(sequence > maxSequence) maxSequence = sequence;
}

cout << "MAX SEQUENCE: " << maxSequence << endl;
return 0;
}

The basic idea is to use the hash map as a bucket sort, so that you only have to do two passes over the data. This algorithm is O(2n), which in turn is O(n)

MobA11y
  • 18,425
  • 3
  • 49
  • 76
0

Don't get your hopes up, this is only a partial answer.

I'm quite confident that the problem is not solvable in O(n). Unfortunately, I can't prove it.

If there is a way to solve it in less than O(n^2), I'd suspect that the solution is based on the following strategy:

  1. Decide in O(n) (or maybe O(n log n)) whether there exists a continuous subarray as you describe it with at least i elements. Lets call this predicate E(i).
  2. Use bisection to find the maximum i for which E(i) holds.

The total running time of this algorithm would then be O(n log n) (or O(n log^2 n)).

This is the only way I could come up with to reduce the problem to another problem that at least has the potential of being simpler than the original formulation. However, I couldn't find a way to compute E(i) in less than O(n^2), so I may be completely off...

blubb
  • 9,510
  • 3
  • 40
  • 82
-1

here's another way to think of your problem: suppose you have an array composed only of 1s and 0s, you want to find the longest consecutive run of 1s. this can be done in linear time by run-length encoding the 1s (ignore the 0's). in order to transform your original problem into this new run length encoding problem, you compute a new array b[i] = (a[i] < a[i+1]). this doesn't have to be done explicitly, you can just do it implicitly to achieve an algorithm with constant memory requirement and linear complexity.

NQZ
  • 1
-2

Here are 3 acceptable solutions:

The first is O(nlog(n)) in time and O(n) space, the second is O(n) in time and O(n) in space, and the third is O(n) in time and O(1) in space.

  1. build a binary search tree then traverse it in order.
    keep 2 pointers one for the start of max subset and one for the end. keep the max_size value while iterating the tree. it is a O(n*log(n)) time and space complexity.

  2. you can always sort numbers set using counting sort in a linear time and run through the array, which means O(n) time and space complexity.

  3. Assuming there isn't overflow or a big integer data type. Assuming the array is a mathematical set (no duplicate values). You can do it in O(1) of memory:

    • calculate the sum of the array and the product of the array
    • figure out what numbers you have in it assuming you have the min and max of the original set. Totally it is O(n) time complexity.
Community
  • 1
  • 1
0x90
  • 39,472
  • 36
  • 165
  • 245
  • 3
    Could you elaborate on how to construct the binary tree? I assume you mean a balanced tree, but could you show what the tree looks like for the examples given by the OP? – blubb Apr 12 '13 at 12:05
  • 2
    I also came up with a red-black custom interval tree but could not figure out how to keep the sequence order in a good fashion. This is the key. Build a binary tree to sort the array won't be difficult. Identifying integer segments that were __contiguous in the initial array__ is much trickier ! Could you insist on that step ? – Rerito Apr 12 '13 at 12:32
  • @Rerito, if you understand how to find the max subset in an ordered array, it is the same thing. – 0x90 Apr 12 '13 at 16:36
  • @0x90 No it is not. You must find a contiguous and unsorted subarray, which is slightly different. – Rerito Apr 12 '13 at 16:37
  • @Rerito once you have the values in a binary tree and you traverse it in order it is equivalent to a sorted array. – 0x90 Apr 12 '13 at 17:56
  • I don't see any solution here. Sorting the array is not enough to find the subarray. Also, [counting sort is not O(n) if the integers are unlimited. If there are no duplicates; you don't need the sum and the product: `min`, `max`, `size` are enough.](http://stackoverflow.com/questions/15966113/longest-subarray-whose-elements-form-a-continuous-sequence/15987767#comment22955455_15987767) – jfs Apr 27 '13 at 15:43