7

Given a list [a_1 a_2 ... a_n] of (not necessarily distinct) integers, determine whether there exist pairwise distinct indices w,x,y,z such that a_w + a_x = a_y + a_z.

I know that one way is to use 4 levels of for loops, each one iterating over one of the indices. When we get equal sums, check whether all the indices are pairwise distinct. If they are, return true. If we've exhausted all the possibilities, return false. This has running time O(n^4).

Can we do better?

John
  • 4,596
  • 11
  • 37
  • 43

3 Answers3

4

Compute all possible values for a_w + a_x, insert them to hash table. Insert (a_w + a_x, w) and (a_w + a_x, x) to second hash table.

Prior to inserting a value to first hash table, check if it is already in the table. If so, check second table. If either of (a_w + a_x, w) or (a_w + a_x, x) is there, don't insert anything (we've got a duplicate element). If neither of these pairs is in the second table, we've got positive answer.

If, after processing all (w, x) pairs, we've got no positive answer, this means there is no such pairwise distinct indices.

Time complexity is O(n2). Space requirements are also O(n2).

It is possible to do the same in O(n) space but O(n2 * log(n)) time with slightly modified algorithm from this answer: Sum-subset with a fixed subset size:

  1. Sort the list.
  2. Use a priority queue for elements, containing a_w + a_x as a key and w, x as values. Pre-fill this queue with n-1 elements, where x = 0 and w = 1 .. n-1.
  3. Repeatedly pop minimal element (sum, w, x) from this queue and put element (a_w + a_x_plus_1, w, x+1) to the queue (but don't put elements when x >= w). Stop when two consecutive elements, removed from queue, have the same sum.
  4. To handle duplicates, it is possible to compare w, x of two consecutive elements, having equal sum. But it's easier to use krjampani's idea of pre-processing. If sorted list contains two pairs of duplicates or a single element is duplicated 4 times, success. Otherwise no more than a single value is duplicated; leave only single instance of it in the list and add its doubled value into priority queue along with a "special" pair of indexes: (2a, -1, -1).
Community
  • 1
  • 1
Evgeny Kluev
  • 24,287
  • 7
  • 55
  • 98
  • But how do you know if the indices are pairwise different? – John Oct 12 '12 at 16:49
  • @John: by construction in your loops. Instead of looping over `i in 1..n, j in 1..n`, loop over `i in 1..n-1, j in i..n`. – Fred Foo Oct 12 '12 at 16:50
  • Yup, but we want `w,x,y,z` to be pairwise different. – John Oct 12 '12 at 16:52
  • @John: Insert indexes into hash map (sum is a key, list of indexes is a value), on second step filter out all matching indexes. – Evgeny Kluev Oct 12 '12 at 16:52
  • @EvgenyKluev: wouldn't the filtering step take linear time? – Fred Foo Oct 12 '12 at 16:55
  • Then i think Space complexity is O(npower2) which is not prefered for large n . – Imposter Oct 12 '12 at 16:56
  • @larsmans: You are right, solution with list of indexes is not good. Instead we can use second hash table, with composite key. And insert both (`a_w + a_x`, w) and (`a_w + a_x`, x). Then search value in the first table and filter out repeating indexes using second table. – Evgeny Kluev Oct 12 '12 at 17:06
  • 1
    @Imposter: in case O(n^2) space is not acceptable, there is other algorithm, referred in the answer. – Evgeny Kluev Oct 12 '12 at 17:23
  • @EvgenyKluev I believe your algorithm is currently incorrect. Try input `[0,0,0,0]`. Then your first hash table contains `{0}`, and your second hash table contains `{(0,1), (0,2), (0,3), (0,4)}`. Then the algorithm returns `false`, even though the answer is clearly `true`. – John Oct 12 '12 at 18:16
  • @John: That's right. I had to fix this algorithm again. Now it works in one pass and should be correct. And I agree with krjampani, his algorithm (for O(n^2) space) is simpler. – Evgeny Kluev Oct 12 '12 at 18:58
3

Evgeny's solution can be some what simplified by preprocessing the original array as follows.

We first use a hash table to count the frequency of each element in the original array. If at least 2 elements have duplicates (their frequency is at least 2) or if an element occurs with frequency at least 4 the answer is true. Otherwise, if an element a occurs with frequency 2 or 3, we add 2a to a second hash table, and replace all copies of a with a single copy in the original array.

Then in the modified array, for each pair of indices i, j with i < j, we add a_i + a_j to the second hash table and return true if we find a duplicate entry in this hash table.

krjampani
  • 2,902
  • 20
  • 23
0

If you have 8.5GB of memory (more for unsigned ints, less if either the sums or indeces don’t span the whole int range), create three arrays. First uses 1 bit per each possible sum. It is a bitmap of results. Second uses 32 bits per each possible sum. It records index j. Third uses 1 bit per each possible sum. It is a bitfield that records if that sum has been encountered in the current iteration i--zero it with each iteration. Iterate i=0...n and j=i+1...n. For each sum, see if it is set in the first bitfield (if it was encountered before). If it is, see if the index recorded in 2nd array matches either i or j (if old j matches either new i or new j). If it is not, check that the bit in 2nd array is set (if it was set in the current iteration and therefore old i matches new i). If it is not, you have a match! (Old i will never match old j or new j, and new i will never match new j.) Exit. Otherwise, record the sum in all three arrays and continue.

Although it uses $40 worth of memory (i love the present:), this is probably much faster than using hash maps and boxing. May even use less memory for large n. One downside is the data will almost never be in the L2 cache. But try to set the JVM to use huge pages so at least the TLB miss won’t go to main memory too. It is o(n^2) for processing and o(1) for memory.

Aleksandr Dubinsky
  • 22,436
  • 15
  • 82
  • 99