searching through a vast collection of potential solutions

Question

I have a quite difficult problem (perhaps even a NP-hard problem ^^) with looking for a solution in a massive collection of results. Perhaps there is an algorithm for it.

Below exercise is artificial but is a perfect example to illustrate my issue.

There is a big array with integers. Lets say it has 100.000 elements.

int numbers[] = {-123,32,4,-234564,23,5,....}

I want to check in a relatively quick way if a sum on any 2 numbers from this array is equal to 0. In other words, if the array has "-123" I want to find is there also a "123" number.

The easiest solution would be brute force - check everything with everything. That gives 100.000 x 100.000 a big number ;-) Obviously brute force method can by optimised. Order numbers and check negatives against positive only. My question is - is there something better then optimised brute force to find a solution?

Of course it's not NP-hard. You've already described a polynomial solution: trying all pairs is `O(n^2)`. — Steve Jessop, Oct 10 '12 at 16:14
Sort and check negatives against positive leads to an O(n lg n) algorithm, which is quite efficient for a problem with about 100k size. Have you tried it? How strict is your time limit? Isn't it efficient enough? — timrau, Oct 10 '12 at 16:16
Note that though this problem is solveable in polynomial time, as others have said, the more general problem (Is there a subset of arbitrary size) is NP-Complete, and is known as the Subset-Sum Problem. — amit, Oct 10 '12 at 16:24
See list of ten duplicates of this question in [my answer](http://stackoverflow.com/a/12775802/837847) to [Determine if array contains two elements which equal a certain sum?](http://stackoverflow.com/questions/12774823/determine-if-array-contains-two-elements-which-equal-a-certain-sum) — James Waldby - jwpat7, Oct 10 '12 at 17:26

Steve Jessop · Answer 1 · 2012-10-10T16:33:19.057

First, sort the array by magnitude of the value.

Then, if the data contains a pair which satisfies the conditions you're after, it contains such a pair adjacent in the array. So just sweep through looking for adjacent pairs whose sum is 0.

Overall time complexity is O(n log n) for the sort, could be O(n) if you use "cheating" sorts not based solely on comparisons. Clearly it can't be done in less than linear time, because in the worst case you can't do it without looking at all the elements. I think n log n is probably optimal in the decision tree model of computing, but only because it "feels a bit like" the element uniqueness problem.

Alternative approach:

Add the elements one at a time to a hash-based or tree-based container. Before adding each element, check whether its negative is present. If so, stop.

This is likely to be faster in the case where there are lots of suitable pairs, because you save the cost of sorting the whole data. That said, you could write a modified sort that exits early by checking for adjacent pairs as soon as any subset of the data is in its final order, but that's effort.

score 3 · Answer 2 · answered Oct 10 '12 at 16:16

Brute force would be an O(n^2) solution. You can certainly do better.

Off the top of my head, first sort it. Heap sort will have a complexity of O(nlogn).

Now, for the first element, say a, you know you need to find an element b, such that a+b = 0. This can be found using binary search (since your array is now sorted). Binary search has a complexity of O(logn).

This gives you an overall solution of O(nlogn) complexity.

dunadar · Accepted Answer · 2012-10-13T15:44:41.100

The example you provided can be brute-force solved in O(n^2) time.

You can start ordering the numbers (O(n·logn)) from smaller to bigger. If you place one pointer at the beginning (the "most negative number") and other at the end (the "most positive"), you can check if there is such pair of numbers in an additional O(n) steps by following the next procedure:

If the numbers at both pointers have the same module, you have the solution
If not, move the pointer of the number with bigger module towards "zero" (this is, increase if it is the pointer on the negative side, decrease if it is the positive-side one)
Repeat until finding a solution, or the pointers cross.

Total complexity is O(n·logn)+O(n) = O(n·logn).

score 0 · Answer 4 · answered Oct 10 '12 at 16:26

Sort your array using Quicksort. After this happened, use two indexes, let's call them positive and negative.

positive <- 0
negative <- size - 1

while ((array[positive] > 0) and (array(negative < 0) and (positive >= 0) and (negative < size)) do
    delta <- array[positive] + array[negative]
    if (delta = 0) then
        return true
    else if (delta < 0) then
        negative <- negative + 1
    else
        positive <- positive - 1
    end if
end while
return (array[positive] * array[negative] = 0)

You didn't say what should the algorithm do if 0 is part of the array, I've supposed that in this case true should be returned.

searching through a vast collection of potential solutions

4 Answers4