6

I am looking for a fast algorithm:

I have a int array of size n, the goal is to find all patterns in the array that
x1, x2, x3 are different elements in the array, such that x1+x2 = x3

For example I know there's a int array of size 3 is [1, 2, 3] then there's only one possibility: 1+2 = 3 (consider 1+2 = 2+1)

I am thinking about implementing Pairs and Hashmaps to make the algorithm fast. (the fastest one I got now is still O(n^2))

Please share your idea for this problem, thank you

Allan Jiang
  • 11,063
  • 27
  • 104
  • 165

3 Answers3

10

Edit: The answer below applies to a version of this problem in which you only want one triplet that adds up like that. When you want all of them, since there are potentially at least O(n^2) possible outputs (as pointed out by ex0du5), and even O(n^3) in pathological cases of repeated elements, you're not going to beat the simple O(n^2) algorithm based on hashing (mapping from a value to the list of indices with that value).


This is basically the 3SUM problem. Without potentially unboundedly large elements, the best known algorithms are approximately O(n^2), but we've only proved that it can't be faster than O(n lg n) for most models of computation.

If the integer elements lie in the range [u, v], you can do a slightly different version of this in O(n + (v-u) lg (v-u)) with an FFT. I'm going to describe a process to transform this problem into that one, solve it there, and then figure out the answer to your problem based on this transformation.

The problem that I know how to solve with FFT is to find a length-3 arithmetic sequence in an array: that is, a sequence a, b, c with c - b = b - a, or equivalently, a + c = 2b.

Unfortunately, the last step of the transformation back isn't as fast as I'd like, but I'll talk about that when we get there.


Let's call your original array X, which contains integers x_1, ..., x_n. We want to find indices i, j, k such that x_i + x_j = x_k.

  1. Find the minimum u and maximum v of X in O(n) time. Let u' be min(u, u*2) and v' be max(v, v*2).

  2. Construct a binary array (bitstring) Z of length v' - u' + 1; Z[i] will be true if either X or its double [x_1*2, ..., x_n*2] contains u' + i. This is O(n) to initialize; just walk over each element of X and set the two corresponding elements of Z.

    As we're building this array, we can save the indices of any duplicates we find into an auxiliary list Y. Once Z is complete, we just check for 2 * x_i for each x_i in Y. If any are present, we're done; otherwise the duplicates are irrelevant, and we can forget about Y. (The only situation slightly more complicated is if 0 is repeated; then we need three distinct copies of it to get a solution.)

    Now, a solution to your problem, i.e. x_i + x_j = x_k, will appear in Z as three evenly-spaced ones, since some simple algebraic manipulations give us 2*x_j - x_k = x_k - 2*x_i. Note that the elements on the ends are our special doubled entries (from 2X) and the one in the middle is a regular entry (from X).

  3. Consider Z as a representation of a polynomial p, where the coefficient for the term of degree i is Z[i]. If X is [1, 2, 3, 5], then Z is 1111110001 (because we have 1, 2, 3, 4, 5, 6, and 10); p is then 1 + x + x2 + x3 + x4 + x5 + x9.

    Now, remember from high school algebra that the coefficient of xc in the product of two polynomials is the sum over all a, b with a + b = c of the first polynomial's coefficient for xa times the second's coefficient for xb. So, if we consider q = p2, the coefficient of x2j (for a j with Z[j] = 1) will be the sum over all i of Z[i] * Z[2*j - i]. But since Z is binary, that's exactly the number of triplets i,j,k which are evenly-spaced ones in Z. Note that (j, j, j) is always such a triplet, so we only care about ones with values > 1.

    We can then use a Fast Fourier Transform to find p2 in O(|Z| log |Z|) time, where |Z| is v' - u' + 1. We get out another array of coefficients; call it W.

  4. Loop over each x_k in X. (Recall that our desired evenly-spaced ones are all centered on an element of X, not 2*X.) If the corresponding W for twice this element, i.e. W[2*(x_k - u')], is 1, we know it's not the center of any nontrivial progressions and we can skip it. (As argued before, it should only be a positive integer.)

    Otherwise, it might be the center of a progression that we want (so we need to find i and j). But, unfortunately, it might also be the center of a progression that doesn't have our desired form. So we need to check. Loop over the other elements x_i of X, and check if there's a triple with 2*x_i, x_k, 2*x_j for some j (by checking Z[2*(x_k - x_j) - u']). If so, we have an answer; if we make it through all of X without a hit, then the FFT found only spurious answers, and we have to check another element of W.

    This last step is therefore O(n * 1 + (number of x_k with W[2*(x_k - u')] > 1 that aren't actually solutions)), which is maybe possibly O(n^2), which is obviously not okay. There should be a way to avoid generating these spurious answers in the output W; if we knew that any appropriate W coefficient definitely had an answer, this last step would be O(n) and all would be well.

    I think it's possible to use a somewhat different polynomial to do this, but I haven't gotten it to actually work. I'll think about it some more....

Partially based on this answer.

Community
  • 1
  • 1
Danica
  • 28,423
  • 6
  • 90
  • 122
  • A set is not an array. This answer is wrong. For instance, consider an array where _all_ elements are equal to zero. Then all pairs of array elements equal every distinct element. Just to enumerate every pair for the sum is O(n^2). Associate a hash table on containers of matching elements, and you get amortised O(n^2) for computing a data structure that represents all possible triples (or O(n^3) for printing as triples). – ex0du5 Apr 24 '12 at 21:44
  • 1
    @ex0du5 there is an easy equivalence here. You can identify duplicated numbers in a single O(n) pass, then handle those sums in time O(n). If you have not found any then strip them out and you've now reduced the problem to the set problem described above. Conversely the set problem is solvable if we have a solution to the array problem. So they are basically the same, even though different in detail. – btilly Apr 24 '12 at 22:14
  • Thanks for pointing that out @ex0du5; I added handling of duplicates (as suggested by btilly) to the algorithm. – Danica Apr 24 '12 at 22:42
  • @btilly: My point wasn't that you couldn't solve my example faster, but that the equivalence to the 3SUM is worng because you are looking for a completely different data structure. You aren't looking at set membership, you are looking at array membership and require a structure that works to identify position in the array. Of the three variables, two are free and the third has value determined, but may be found in 0, 1, or many positions of the array. So to solve, you need to derive a solution to the array problem and give a data structure that stores positions, not values. – ex0du5 Apr 24 '12 at 23:15
  • As another example, look at array {1, 2, 3, 4, ..., n}. Obviously storing duplicates doesn't solve the issue of needing n(n-1)/2 different pairs to show the degrees of freedom. My point isn't that the 3-Sum problem isn't relevant, but that the actual problem has an obvious worst-case that is as hard or harder than 3-sum, so complexity calculation for 3-sum don't matter. If we weren't looking for patterns in array and were simply looking at values, then duplicates, etc. are not a concern. For finding a data structure here, though, position in the array must be stored somewhere. – ex0du5 Apr 24 '12 at 23:23
  • @ex0du5 I don't understand your complaint: if we solve it with values we can obviously find positions in O(n) time. To be specific, what's wrong with my algorithm given above (other than the last step, which is irrelevant if we simply change the problem to `x_i + x_j + x_k = 0` rather than `x_i + x_j = x_k` and so can probably be addressed in a way that's cleverer than I'm being right now)? – Danica Apr 25 '12 at 00:06
  • @Dougal: My point is simply that even with a lookup:Value->Position function that can be done in O(1), you still have potentially O(n^2) lookups for all potential pairs of array members. Even if you solve the problem O(1), you have that. After you've solved the problem, you need to map to language of the array. Maybe this is easier if I turn around the question. What do you output to solve the problem? How do you output your solution so that all possible array patterns are indicated? To be concrete, simply show the output for {1, 2, ..., n} as that has O(n^2) distinct solution pairs. – ex0du5 Apr 25 '12 at 00:49
  • Oh -- I just realized that I misread the original question, in that the OP wants *all* such triplets, not just one. Obviously there's an O(n^2) lower bound then when there are n^2 possible solutions (or n^3 when all are 0). I didn't understand what you meant by saying it's a value problem rather than sets -- that's not the problem, as your {1, 2, ..., n} example shows. – Danica Apr 25 '12 at 03:10
1

It has to be at least O(n^2) as there are n(n-1)/2 different sums possible to check for other members. You have to compute all those, because any pair summed may be any other member (start with one example and permute all the elements to convince yourself that all must be checked). Or look at fibonacci for something concrete.

So calculating that and looking up members in a hash table gives amortised O(n^2). Or use an ordered tree if you need best worst-case.

ex0du5
  • 2,586
  • 1
  • 14
  • 14
1

You essentially need to find all the different sums of value pairs so I don't think you're going to do any better than O(n2). But you can optimize by sorting the list and reducing duplicate values, then only pairing a value with anything equal or greater, and stopping when the sum exceeds the maximum value in the list.

Borodin
  • 126,100
  • 9
  • 70
  • 144