Storing pairwise sums in linear space

Question

If we have two arrays of size n each and want to sort their sums, the naive approach would be to store their sums in O(n^2) space and sort it in O(n^2 logn) time. Suppose we're allowed to have the same running time of O(n^2 logn), how would we store the sums in linear space of O(n)?

I suppose we're not intending to store all the sums since they n^2 elements won't fit into n space and that we're merely printing out everything in sorted order, so does this mean that we have to dynamically store the items? Any tips?

(this is a homework problem)

What do you mean exactly by "sums of two arrays"? Please provide an example with expected output and what you have tried so far. — igon, Apr 29 '15 at 21:06
If we have 1 2 3 4 5 and 2 3 4 5 6, then the sums would be 3 4 5 6 7 4 5 6 7 8 5 6 7 8 9 6 7 8 9 10 7 8 9 10 11. — maregor, Apr 29 '15 at 21:28

score 4 · Accepted Answer · answered Apr 29 '15 at 21:19

4

The problem, as I understand it, is that we want to find the sums

a1 + b1  a1 + b2  ...  a1 + bn
a2 + b1  a2 + b2  ...  a2 + bn
  ...      ...    ...    ...
an + b1  an + b2  ...  an + bn

and print them in sorted order. The restriction is to use only O (n) memory and O (n^2 log n) time in the process.

Consider the above table as n lists (lines) of n elements each. If we sort the initial arrays so that a1 <= a2 <= ... <= an and b1 <= b2 <= ... <= bn, each list is already sorted. Now, the problem is reduced to merging n sorted lists.

To devise that, think of how to merge two sorted lists (as in MergeSort), then three lists, and so on. This extends trivially to merging n lists of length n each in n operations for each output element, for a total of O (n^3). Now, what's left is to reduce time for getting each output element down to O (log n). As you ask for a hint but not a complete solution, see if you can handle that step yourself.

answered Apr 29 '15 at 21:19

Gassa

8,546
3
29
49

1

You mention that you want to merge n sorted lists. How would we merge all those lists into n space if there n lists themselves occupy more than n space? – maregor Apr 29 '15 at 21:35
1

@maregor A list is not stored explicitly. Instead, we just store a pointer to where we stopped traversing that list. For example, if we now want the 4th element of 9th list, we know it is `a9 + b4`. If we choose to consume (output) that element, the pointer in that list moves to its 5th value, `a9 + b5`. The pointer is just a number from `1` to `n+1`, the last one meaning the list is already exhausted. – Gassa Apr 29 '15 at 21:41
So the sorting time for both arrays is just O(nlogn) + O(nlogn) right? – maregor Apr 29 '15 at 22:24
1

@maregor Yeah, as they are both of size `n`, and we sort them independently. – Gassa Apr 29 '15 at 22:43
@Palcente That depends on [pivot selection strategy](http://stackoverflow.com/questions/164163/quicksort-choosing-the-pivot). – Gassa Apr 29 '15 at 22:47
quicksort runs at O(n lg n) on average and O(n^2) in worst case (if the arrays are already sorted)... – Palcente Apr 29 '15 at 22:48
Once both arrays are sorted, I was thinking of having a pointer at each array to add up the sums and print them but it may not work well and the O(n) space may not be used. Should I just store the first list in the O(n) space and compare other possible lists to it? – maregor Apr 30 '15 at 01:16
@maregor I don't know how to do that with two pointers only. Imagine `n = 10` and the pointers are at `(5, 5)`, thus pointing at the sum `a5 + b5`. How do you find what's next? It may well be `a4 + b7`, `a1 + b6` or `a6 + b2`. That said, I don't claim such a solution is impossible, I just don't see it. – Gassa Apr 30 '15 at 10:22
That's true, so that's a bad idea. But how would using n space make it work? – maregor Apr 30 '15 at 22:04
@maregor For each row `i`, we remember `j` pointing at the sum `ai + bj` we stopped on. – Gassa May 03 '15 at 14:41

JuniorCompressor · Answer 2 · 2015-04-30T23:35:09.387

3

In python you can something like that:

import heapq

a = [2, 1, 3]
b = [4, 6, 5]

a.sort()
b.sort()

def add_to_b(x):
    for v in b:
        yield v + x

for v in heapq.merge(*[add_to_b(x) for x in a]):
    print v

Result:

The idea is that we sort both arrays. Then adding to b an element of a defines a generator of increasing numbers. So we create n such generators and we merge them using heapq.merge. A generator, represented by add function above, at a specific time needs constant space (space needed to keep the current position in b). heapq.merge itself needs linear space. So we need linear space for the execution of the algorithm.

edited Apr 30 '15 at 23:35

answered Apr 29 '15 at 22:16

JuniorCompressor

19,631
4
30
57

1

Wow, that's one short implementation! Python really shines here. – Gassa Apr 29 '15 at 22:49
Could you explain for v in heapq.merge(*map(add, a)):? I'm not really good at python and could use some explanation. – maregor Apr 30 '15 at 23:32
1

`[add_to_b(x) for x in a]` will create a generator for each `x` belonging to `a`. The generator adds `x` to each element of `b`. We pass this sequence of generators to `heapq.merge` that will merge them. – JuniorCompressor Apr 30 '15 at 23:38
The add_to_b adds all v elements in b, but you are passing all elements in a to it. How can I visualize what is happening there? – maregor May 01 '15 at 00:58
1

if `a = [1, 2, 3]`, `b = [4, 5, 6]` then `add_to_b(1)` will generate `1 + 4, 1 + 5, 1 + 6`, `add_to_b(2)` will generate `2 + 4, 2 + 5, 2 + 6`, `add_to_b(3)` will generate `3 + 4, 3 + 5, 3 + 6`. So we have 3 sequences a) `5, 6, 7` b) `6, 7, 8` c) `7, 8, 9` that need to be merged. – JuniorCompressor May 01 '15 at 10:03

coderz · Answer 3 · 2015-06-01T02:01:42.547

First sort the 2 arrays in ascending order, the time complexity is 2 * O(n*lgn), which can be also regarded as O(n*lgn). Then use a max heap with length n + 1 to maintain the minimum n sums.

How to maintain the minimum n sums? First push a1 + b1, a1 + b2, a1 + b3, ..., a1 + bn to the heap. Then for every a[i], 1 <= i < n and b[j], 0 <= j < n, push a[i] + b[j] and then pop the largest one:

for(int j=0;j<n;j++) {
    heap.push_into(a[0] + b[j]);
}
for(int i=1;i<n;i++) {
    for(int j=0;j<n;j++) {
        heap.push_into(a[i] + b[j]);
        heap.pop(); // remove the current largest sum in the heap, then the sums remain in the heap are the smallest n sums
    }
}

Then the n elements in the heap are the smallest n sums.

The time complexity is O(n^2 * lgn), space complexity is O(n).

Storing pairwise sums in linear space

3 Answers3