36

Taken from Introduction to Algorithms

Describe a Θ(n lg n)-time algorithm that, given a set S of n integers and another integer x, determines whether or not there exist two elements in S whose sum is exactly x.

This is my best solution implemented in Java so far:

    public static boolean test(int[] a, int val) {
    mergeSort(a);

    for (int i = 0; i < a.length - 1; ++i) {
        int diff = (val >= a[i]) ? val - a[i] : a[i] - val;

        if (Arrays.binarySearch(a, i, a.length, diff) >= 0) {
            return true;
        }
    }

    return false;
}

Now my 1st question is: Is this a correct solution? From my understanding, mergeSort should perform the sort in O(n lg n), the loop should take O(n lg n) (n for the iteration multiplied by O(lg n) for the binary search, resulting in O(2n lg n), so it should be correct.

My 2nd question is: Are there any better solutions? Is sorting the array essential?

codaddict
  • 445,704
  • 82
  • 492
  • 529
helpermethod
  • 59,493
  • 71
  • 188
  • 276
  • Are you allowed to use hash sets? If so, then you can get to O(n), see below. – Itay Maman Jan 31 '10 at 14:41
  • 4
    `(val >= a[i]) ? val - a[i] : a[i] - val` this what `Math.abs()` is for :) – BlueRaja - Danny Pflughoeft Jan 31 '10 at 16:40
  • 5
    Yes, `Math.abs` is for that, but I would say even more: it's not needed here. In fact is wrong. `diff` must always be assigned the value `val-a[i]` regardless of whether this difference is positive or negative. – Ernesto Jul 21 '11 at 19:13
  • as suggested in a few of the answers below there is a better O(n) sol'n given a few assumptions on the input. – Zahra Apr 12 '14 at 22:31
  • Yes sorting seems to be necessary, but once you have a sorted list you can improve. Instead of naively using binary search, you can think of following method: Initialise pointers l,r to start and end of the list. Check a[l]+a[r] = f; If f > x then r--; else if f – Prakhar Agrawal Oct 09 '16 at 16:56

8 Answers8

50

Your solution seems fine. Yes you need to sort because its a pre requisite for binary search. You can make a slight modification to your logic as follows:

public static boolean test(int[] a, int val) 
{
    Arrays.sort(a);

    int i = 0;            // index of first element.
    int j = a.length - 1; // index of last element. 

    while(i<j)
    {
        // check if the sum of elements at index i and j equals val, if yes we are done.
        if(a[i]+a[j] == val)
            return true;
        // else if sum if more than val, decrease the sum.
        else if(a[i]+a[j] > val)
            j--;
        // else if sum is less than val, increase the sum.
        else
            i++;
    }
    // failed to find any such pair..return false. 
    return false;
}
codaddict
  • 445,704
  • 82
  • 492
  • 529
14

There's another very fast solution: Imagine you have to solve this problem in Java for about 1 billions integers. You know that in Java integers go from -2**31+1 to +2**31.

Create an array with 2**32 billion bit (500 MB, trivial to do on today's hardware).

Iterate over your set: if you have an integer, set corresponding bit to 1.

O(n) so far.

Iterate again over your set: for each value, check if you have a bit set at "current val - x".

If you have one, you return true.

Granted, it needs 500 MB of memory.

But this shall run around any other O(n log n) solution if you have, say, to solve that problem with 1 billion integers.

O(n).

panzerpower
  • 61
  • 1
  • 5
SyntaxT3rr0r
  • 27,745
  • 21
  • 87
  • 120
  • 2
    So you allocate, and zero-out, 500 MB of memory, to solve this problem for n=10000? – meriton Jan 31 '10 at 15:01
  • 3
    I quote from this answer: "Imagine you have to solve this problem in Java for about 1 billions integers". I think it's pretty clear that when OldEnthusiast says "very fast", he means very fast for large problems, i.e. low complexity. This is not the most efficient solution for n=1: `return false;` is ;-) – Steve Jessop Jan 31 '10 at 15:43
  • +1 poster asked for Θ(n log n) time, this trumps even that; he said nothing about space constraints :) – BlueRaja - Danny Pflughoeft Jan 31 '10 at 16:25
  • 1
    -1. This does not work for all input and for small input, the hidden constant is huge. –  Jan 31 '10 at 18:10
  • @Moron: what input does this fail for? That the questioner's code works for, I mean: there's the bug where if x is equal to twice the value of a number in the set, then you can get a false positive, but assuming that "set" in the question means no duplicates, the fix is the same in both cases and just requires checking the special case. – Steve Jessop Jan 31 '10 at 23:06
  • @Steve Jessop. 32 bit integers are assumed. I don't see that stated anywhere in the problem statement. Do you? 64 bit machines are common these days. –  Feb 01 '10 at 00:32
  • 1
    @Moron: The problem also doesn't say anything about memory constraints, so search the list for the largest `n` then create an array of that size and run this algorithm. That is `O(k)` (where k is the largest value), due to the zeroing of the array. Algorithms such as this are known as pseudo-polynomial (http://en.wikipedia.org/wiki/Pseudo-polynomial_time), and while helpful in some *(many)* cases in the real world, it is not a solution to this problem as `k` is unrelated to `n`. – BlueRaja - Danny Pflughoeft Feb 01 '10 at 02:33
  • @BlueRaja, the problem does not talk about the memory constraints. It does not talk about a model of computation on which to base the time complexity etc, either. Do you really want to go into that discussion? –  Feb 01 '10 at 03:44
  • @Moron: I am agreeing with you why are you arguing with me – BlueRaja - Danny Pflughoeft Feb 01 '10 at 03:47
  • @BlueRaja: That was not an argument. Anyway, sorry I misunderstood you. –  Feb 01 '10 at 04:31
  • @Moron: The solution given is in Java, so int is 32bit regardless of the word size of the machine. Granted, this solution does not work in all programming languages or for all possible formats of the input integers, but I asked for inputs where this fails *and the questioner's code works*. Of course possibly you would have voted down the questioner's code on the same basis (doesn't handle 64bit input), had it been offered as an answer :-) – Steve Jessop Feb 01 '10 at 10:29
  • Nothings stops you from having a strategy at the beginning where you use the HashSet O(n) solution for small samples and the 500 MB array solution for huge samples. Note that the original question specifically passes an int[] containing the "set of integers" to his O(n log n) solution. – SyntaxT3rr0r Feb 01 '10 at 13:52
  • @Steve Jessop: If/when Java moves to 64 bits, the questioner has to do nothing, but this 500MB solution has to change and the memory usage will become impossible to handle (2^64 bits of memory is probably more than the atoms in the galaxy etc etc). In other words, this does not scale. @OldEnthusiast: The basic idea of your algorithm is still unscaleable. –  Feb 01 '10 at 15:17
  • If anyone is carefully writing their Java code against the possibility that `int` will someday be 64 bits, I'll happily point and laugh at them. – Steve Jessop Feb 01 '10 at 17:56
  • You have it backwards. Read again. Anyway, I am not interested in discussing this further. I have stated my case. –  Feb 02 '10 at 01:40
  • 1
    This wouldn't work. Say X is 10 and you're looking at the number 5. 10-5 = 5, meaning that 5 exists in the array, however, you only have a single 5. – wk1989 Nov 26 '11 at 00:04
7
  1. This is correct; your algorithm will run in O(n lg n) time.

  2. There is a better solution: your logic for calculating diff is incorrect. Regardless of whether a[i] is greater than or less than val, you still need diff to be val - a[i].

danben
  • 80,905
  • 18
  • 123
  • 145
  • 1
    Yes! I was reading this thread and was wondering if nobody noticed that the diff must always be `va - a[i]`. – Ernesto Jul 21 '11 at 19:10
5

Here's an O(n) solution using a hash-set:

  public static boolean test(int[] a, int val) {
      Set<Integer> set = new HashSet<Integer>();

      // Look for val/2 in the array
      int c = 0;
      for(int n : a) {
        if(n*2 == val)
          ++c
      }
      if(c >= 2)
         return true; // Yes! - Found more than one

      // Now look pairs not including val/2
      set.addAll(Arrays.asList(a));
      for (int n : a) {
         if(n*2 == val)
            continue;
         if(set.contains(val - n))
            return true;
      }

      return false;
   }
Itay Maman
  • 30,277
  • 10
  • 88
  • 118
  • What if there are collisions during HashSet lookup? Their number must be bounded for the amortized lookup to be in O(1). What about the time to allocate the backing array? How do you know a O(n) backing array is enough? – meriton Jan 31 '10 at 14:42
  • 1
    @meriton I don't follow you. For all practical purposes (and in particular in these type of questions) a hash-table lookup can be considered as O(1). If we were talking about exact time then the points you raise may be viable, but still, an O(n) algorithm will eventually beat O(nlogn) if n is large enough. – Itay Maman Jan 31 '10 at 14:47
  • For instance, assume hashCode(i) = i % size. I now insert n multiples of size into the set. The hashSet has degenerated to a linear list. – meriton Jan 31 '10 at 14:53
  • @meriton: your comment makes no sense. Itay's solution is O(n), just as mine. – SyntaxT3rr0r Jan 31 '10 at 14:53
  • 4
    And what if I now replied that your comment makes no sense either? Do you think this discussion would get anywhere? To clarify: The proof that a HashSet offers O(1) lookup assumes a good distribution of hash values, which depends on your input. If all numbers in the input get the same bucket in the hashSet, the HashSet will be pretty slow (java.util.HashSet uses a LinkedList to hold the contents of a bucket). Usually, your input will distribute well, but not always. Hence, a hashSet does not guarantee a constant worst-case lookup complexity. – meriton Jan 31 '10 at 15:06
  • 1
    @Itay: true, `hashCode(i)==i`, but HashSet doesn't use 2^32 buckets, it uses fewer than that. So the real hash function used is a modulus over the number of buckets. AFAIK the Java HashSet re-hashing strategy doesn't guarantee a worst-case O(1) elements per bucket (except in the trivial sense that almost anything done on bounded-size integers is O(1)), since it's based on load factor alone, not on the occupancy of the worst bucket in the table. – Steve Jessop Jan 31 '10 at 15:34
  • 1
    @OldEnthusiast: meritor is right, this is **not** `O(n)`, because Hash-table lookups are not worst-case `O(1)` (which means they are not worst-case `Θ(1)`), they are average-case `Θ(k/n)` (see http://en.wikipedia.org/wiki/Hash_table#Performance_analysis). When a problem says *"find an O(something)-time algorithm,"* it always means worst-case. However, I will not give a -1, since, though it is not the answer to this particular problem, it *is* the solution I would use if I encountered this problem in the real-world. – BlueRaja - Danny Pflughoeft Jan 31 '10 at 16:31
  • @BlueRaja: "When a problem says "find an O(something)-time algorithm," it always means worst-case." - this is false and represents a poor understanding of big-O notation. Big-O has everything to do with the growth rate of a particular function and nothing to do with best/worst/average case. A function can very well be O(n) in the average case and O(n^2) in the worst case. – danben Jan 31 '10 at 17:28
  • 2
    @danben: I understand that; but when they don't state *"worst-case,"* *"average-case,"* or *"best-case"* in a problem for an Algorithms class, it's always implicitly understood that the problem is asking for *"worst-case."* This answer is very good in the average case, but not in the worst-case, which is where the confusion comes from, because the OP did not explicitly say one or the other. – BlueRaja - Danny Pflughoeft Jan 31 '10 at 17:37
  • 1
    @danben: presumably if Itay is allowed to assume average case instead of worst case, then I'm allowed to assume best case instead of worst case, and present an algorithm which is Theta(n) best case but Theta(n^2) average and worst case? The reason worst case should be assumed unless stated otherwise is precisely to bar students from this kind of hijinks. – Steve Jessop Jan 31 '10 at 23:18
4

I do think I have spotted a minor bug in your implementation, but testing should uncover that one quickly.

The approach looks valid, and will reach the desired performance. You might simplify it by replacing the iterative binary search with a scan through the array, in effect replacing the binary search by a linear search that resumes where the previous linear search left off:

int j = a.length - 1;
for (int i = 0; i < a.length; i++) {
    while (a[i] + a[j] > val) {
        j--;
    }
    if (a[i] + a[j] == val) {
        // heureka!
    }
}

This step is O(n). (Proving that is left as an exercise for you.) Of course, the entire algorithm still takes O(n log n) for the merge sort.

meriton
  • 68,356
  • 14
  • 108
  • 175
4

A simple solution is, after sorting, move pointers down from both ends of the array, looking for pairs that sum to x. If the sum is too high, decrement the right pointer. If too low, increment the left one. If the pointers cross, the answer is no.

Neal Gafter
  • 3,293
  • 20
  • 16
2

Your analysis is correct, and yes you must sort the array or else binary search does not work.

Sean Owen
  • 66,182
  • 23
  • 141
  • 173
0

Here's is an alternate solution, by adding few more conditions into mergesort.

public static void divide(int array[], int start, int end, int sum) {

    if (array.length < 2 || (start >= end)) {
        return;
    }
    int mid = (start + end) >> 1; //[p+r/2]
    //divide
    if (start < end) {
        divide(array, start, mid, sum);
        divide(array, mid + 1, end, sum);
        checkSum(array, start, mid, end, sum);
    }
}

private static void checkSum(int[] array, int str, int mid, int end, int sum) {

    int lsize = mid - str + 1;
    int rsize = end - mid;
    int[] l = new int[lsize]; //init
    int[] r = new int[rsize]; //init

    //copy L
    for (int i = str; i <= mid; ++i) {
        l[i-str] = array[i];
    }
    //copy R
    for (int j = mid + 1; j <= end; ++j) {
        r[j - mid - 1] = array[j];
    }
    //SORT MERGE
    int i = 0, j = 0, k=str;
    while ((i < l.length) && (j < r.length) && (k <= end)) {
    //sum-x-in-Set modification
    if(sum == l[i] + r[j]){
        System.out.println("THE SUM CAN BE OBTAINED with the values" + l[i] + " " + r[j]);            
    }
     if (l[i] < r[j]) {
            array[k++] = l[i++];
        } else {
            array[k++] = r[j++];
        }
    }
    //left over
    while (i < l.length && k <= end) {
        array[k++] = l[i++];
          //sum-x-in-Set modification
        for(int x=i+1; x < l.length; ++x){
            if(sum == l[i] + l[x]){
                System.out.println("THE SUM CAN BE OBTAINED with the values" + l[i] + " " + l[x]);
            }
        }
    }
    while (j < r.length && k <= end) {
        array[k++] = r[j++];
          //sum-x-in-Set modification
        for(int x=j+1; x < r.length; ++x){
            if(sum == r[j] + r[x]){
                System.out.println("THE SUM CAN BE OBTAINED with the values" + r[j] + " " + r[x]);
            }
        }
    }
}

But the complexity of this algorithm is still not equal to THETA(nlogn)

sij
  • 296
  • 3
  • 13