Finding unique numbers from sorted array in less than O(n)

Question

I had an interview and there was the following question:

Find unique numbers from sorted array in less than O(n) time.
Ex: 1 1 1 5 5 5 9 10 10
Output: 1 5 9 10

I gave the solution but that was of O(n).

Edit: Sorted array size is approx 20 billion and unique numbers are approx 1000.

You have to atleast know the last element, so wouldn't you have to traverse at least all elements once. So minimum bound is O(N) — coder hacker, Nov 16 '14 at 14:42
Break out of the loop if the new "unique" number is the same as the number on the last index. Therefore, if you reach the first `3`, you can stop the loop. — Tom, Nov 16 '14 at 14:42
@Tom Still it will be linear to number of elements so O(N). I hope you know what O(N) means? — coder hacker, Nov 16 '14 at 14:45
You could do something based on sampling/binary search, though it's hard to guess how it would work out. — Hot Licks, Nov 16 '14 at 14:45
@Tom BigOh O() notation is the worst case scenario for your algorithm, independent of the data your algorithm process. A binary search is O(log n) because as long as the array is sorted you jump at most O(log n) times no matter what data is on the array. — vz0, Nov 16 '14 at 14:47
@DeepakTiwari From your description of the problem, and the desired running time, I'm thinking you just want 'one' or 'some' unique numbers. Right? — Thomas Ahle, Nov 16 '14 at 23:13
@ThomasAhle I want all unique numbers present in sorted array. — Deepu--Java, Nov 17 '14 at 05:17
@Deepak Ok, then I think one way to formulate what you want is "an output sensitive algorithm" . — Thomas Ahle, Nov 17 '14 at 07:35

Peter Walser · Accepted Answer · 2014-11-16T19:22:37.660

21

Divide and conquer:

look at the first and last element of a sorted sequence (the initial sequence is data[0]..data[data.length-1]).
If both are equal, the only element in the sequence is the first (no matter how long the sequence is).
If the are different, divide the sequence and repeat for each subsequence.

Solves in O(log(n)) in the average case, and O(n) only in the worst case (when each element is different).

Java code:

public static List<Integer> findUniqueNumbers(int[] data) {
    List<Integer> result = new LinkedList<Integer>();
    findUniqueNumbers(data, 0, data.length - 1, result, false);
    return result;
}

private static void findUniqueNumbers(int[] data, int i1, int i2, List<Integer> result, boolean skipFirst) {

    int a = data[i1];
    int b = data[i2];

    // homogenous sequence a...a
    if (a == b) {
        if (!skipFirst) {
            result.add(a);
        }
    }
    else {
        //divide & conquer
        int i3 = (i1 + i2) / 2;
        findUniqueNumbers(data, i1, i3, result, skipFirst);
        findUniqueNumbers(data, i3 + 1, i2, result, data[i3] == data[i3 + 1]);
    }
}

edited Nov 16 '14 at 19:22

answered Nov 16 '14 at 16:39

Peter Walser

15,208
4
51
78

3

O(log n) is not the average case. It's O(log n) (or better) only when there's a relatively large amount of duplication. In the general case, it's O(n). – Jim Mischel Nov 16 '14 at 19:30
You're right, but this solution *can* find the unique numbers in less than O(n), as it doesn't necessarily need to look at all the numbers. Having no duplication is the worst case, not the average case. – Peter Walser Nov 16 '14 at 20:23
1

@PeterWalser I think your solution is better than others and fits with my edited question. Before accepting your answer let me check other inputs. Thanks. – Deepu--Java Nov 17 '14 at 05:24
@PeterWalser: we know very little about the distribution of the input. you cannot assert what the average case is or isn't. – Karoly Horvath Nov 22 '14 at 11:37
what about this case [1,2]? – Manish Kasera Jan 15 '15 at 20:00
1

I think as per the question asked by OP (20 billion numbers and only 1000 unique), this method works pretty well and retrieves unique elements in O (logn) time. In case of all unique elements, no algorithm is known to solve in less than O(n) anyway. – gaurav jain Aug 17 '15 at 17:39

score 16 · Answer 2 · answered Nov 16 '14 at 14:50

16

I don't think it can be done in less than O(n). Take the case where the array contains 1 2 3 4 5: in order to get the correct output, each element of the array would have to be looked at, hence O(n).

answered Nov 16 '14 at 14:50

DanielGibbs

9,910
11
76
121

I agree with you and I also gave the same answer but he told me that it's possible. That's why I am looking for the answer here because I haven't got yet how it's possible. – Deepu--Java Nov 16 '14 at 14:53
It's possible that the interviewer understood O(n) differently, or counted library functions as constant time. – DanielGibbs Nov 16 '14 at 14:55
You can make complexity as a function of something else (eg. distinct elements in the array). See my answer below. – ElKamina Nov 16 '14 at 19:56

score 5 · Answer 3 · edited Aug 24 '16 at 06:00

5

If your sorted array of size n has m distinct elements, you can do O(mlogn).

Note that this is going to efficient when m << n (eg m=2 and n=100)

Algorithm:

Initialization: Current element y = first element x[0]

Step 1: Do a binary search for the last occurrence of y in x (can be done in O(log(n)) time. Let it's index be i

Step 2: y = x[i+1] and go to step 1

Edit: In cases where m = O(n) this algorithm is going to work badly. To alleviate it you can run it in parallel with regular O(n) algorithm. The meta algorithm consists of my algorithm and O(n) algorithm running in parallel. The meta algorithm stops when either of these two algorithms complete.

edited Aug 24 '16 at 06:00

Shaishav Jogani

2,111
3
23
33

answered Nov 16 '14 at 19:55

ElKamina

7,747
28
43

But it's still not less than O(n) – Sopel Nov 16 '14 at 19:58
Not in the worst case. But when m< – ElKamina Nov 16 '14 at 20:04
You are correct that it is faster in some cases, but asymptotic complexities cannot be compared just in some range. I know what you mean, but it doesn't solve the given problem. – Sopel Nov 16 '14 at 20:11
Fixed it a little bit. Complexity <= O(n) now. – ElKamina Nov 16 '14 at 20:23

score 0 · Answer 4 · answered Nov 16 '14 at 16:24

Since the data consists of integers, there are a finite number of unique values that can occur between any two values. So, start with looking at the first and last value in the array. If a[length-1] - a[0] < length - 1, there will be some repeating values. Put a[0] and a[length-1] into some constant-access-time container like a hash set. If the two values are equal, you konow that there is only one unique value in the array and you are done. You know that the array is sorted. So, if the two values are different, you can look at the middle element now. If the middle element is already in the set of values, you know that you can skip the whole left part of the array and only analyze the right part recursively. Otherwise, analyze both left and right part recursively.

Depending on the data in the array you will be able to get the set of all unique values in a different number of operations. You get them in constant time O(1) if all the values are the same since you will know it after only checking the first and last element. If there are "relatively few" unique values, your complexity will be close to O(log N) because after each partition you will "quite often" be able to throw away at least one half of the analyzed sub-array. If the values are all unique and a[length-1] - a[0] = length - 1, you can also "define" the set in constant time because they have to be consecutive numbers from a[0] to a[length-1]. However, in order to actually list them, you will have to output each number, and there are N of them.

Perhaps someone can provide a more formal analysis, but my estimate is that this algorithm is roughly linear in the number of unique values rather than the size of the array. This means that if there are few unique values, you can get them in few operations even for a huge array (e.g. in constant time regardless of array size if there is only one unique value). Since the number of unique values is no grater than the size of the array, I claim that this makes this algorithm "better than O(N)" (or, strictly: "not worse than O(N) and better in many cases").

It seems you gave the solution for sequential entries but it can be 1 55 55 1000, like this also. — Deepu--Java, Nov 16 '14 at 17:20
@DeepakTiwari The original question states that the array is sorted — Michał Kosmulski, Nov 16 '14 at 17:48
@DeepakTiwari Yes, it's true that there is no gain in the case of 1, 55, 55, 1000 but we're looking at asymptotic behavior. If the number "55" was repeated 100 times in this sequence instead of just two, the complete set of values would require much fewer operations than 102, the length of the sequence. — Michał Kosmulski, Nov 16 '14 at 18:53

score 0 · Answer 5 · answered Aug 24 '16 at 05:23

import java.util.*;

/**
 * remove duplicate in a sorted array in average O(log(n)), worst O(n)
 * @author XXX
 */
public class UniqueValue {
    public static void main(String[] args) {
        int[] test = {-1, -1, -1, -1, 0, 0, 0, 0,2,3,4,5,5,6,7,8};
        UniqueValue u = new UniqueValue();
        System.out.println(u.getUniqueValues(test, 0, test.length - 1));
    }

    // i must be start index, j must be end index
    public List<Integer> getUniqueValues(int[] array, int i, int j) {
        if (array == null || array.length == 0) {
            return new ArrayList<Integer>();
        }
        List<Integer> result = new ArrayList<>();
        if (array[i] == array[j]) {
            result.add(array[i]);
        } else {
            int mid = (i + j) / 2;
            result.addAll(getUniqueValues(array, i, mid));

            // avoid duplicate divide
            while (mid < j && array[mid] == array[++mid]);
            if (array[(i + j) / 2] != array[mid]) {
                result.addAll(getUniqueValues(array, mid, j));
            }
        }
        return result;
    }
}

It is an interesting idea to use divide and conquer in this problem. My code is ready to run. The divide part is a little trick, you have to avoid duplicate element appear in both side. — Qiang Dai, Aug 24 '16 at 05:27

Finding unique numbers from sorted array in less than O(n)

5 Answers5

Linked

Related