Extracting 2 numbers n times and placing back the addition in O(n) instead of O(n*log(n))

Question

I'm presenting a problem my professor showed in class, with my O(n*log(n)) solution:

Given a list of n numbers we'd like to perform the following n-1 times:

Extract the two minimal elements x,y from the list and present them
Create a new number z , where z = x+y
Put z back into the list

Suggest a data structure and algorithm for O(n*log(n)) , and O(n)

Solution:

We'll use a minimal heap:

Creating the heap one time only would take O(n). After that, extracting the two minimal elements would take O(log(n)). Placing z into the heap would take O(log(n)).

Performing the above n-1 times would take O(n*log(n)), since:

O(n)+O(n∙(logn+logn ))=O(n)+O(n∙logn )=O(n∙logn )

But how can I do it in O(n)?

EDIT:

By saying: "extract the two minimal elements x,y from the list and present them ", I mean printf("%d,%d" , x,y), where x and y are the smallest elements in the current list.

Just to make sure it's clear: you want to reduce the array to a single element, each step removing the 2 minimal elements and injecting back their sum, and you want to do this in O(n) time? — cheeken, Jun 19 '12 at 00:35
Do you know anything about the sizes of the numbers in the list? Are they guaranteed to be within some range? — templatetypedef, Jun 19 '12 at 00:49
There's no way you could have misunderstood the problem, is there? :) My kneejerk reaction is to say this is impossible. Clearly, you need to iterate over the array (n-1) times, which means you'll be peforming _some_ operation on the scale of O(n) times. To achieve an overall O(n) time, you'd need that operation to be O(1). You can find the two minimum values in O(1) time in a sorted list, but injecting a value back into the list and maintaining the sort would be log(n). Unless you want to get into radix sort, but that's a bit fishy. — cheeken, Jun 19 '12 at 00:49
Proof of cheeken's kneejerk reaction: Suppose you can do this. Suppose further that when you insert `z` into the list, you stick a flag on it to say "this is a computed value, not an original value". Suppose finally that when you present the numbers, you only print out the ones with the flag unset. Then you have sorted your list of numbers in `O(n)`. Therefore, some kind of skullduggery is required, such as for example radix sort on fixed-size integers. — Steve Jessop, Jun 19 '12 at 00:52
If the maximum input number (m) is finite, and the length of the list (n) is finite, then I *think* it would be possible to do this in O(n) time and O(m*n) space (not including memory initialization). — Kendall Frey, Jun 19 '12 at 00:57
It seems that O(n) is possible if: 1) The list is sorted 2) the sum of two min values always become the largest value, so you can stick `z` to the end of the list. For example: `[3,4,5,6]` => `[5,6,7]` => `[7,11]` => `[18]`. — Akavall, Jun 19 '12 at 01:13
@Akavall yes, and sorting a list is constant time if it's a precondition that it's already sorted (I joke). If that was the case it should have been stated in the original problem. I really hope the OP asks thier prof the next time he sees them because so far as I can tell it's impossible. Even if you use bucket sort (O(n)), dealing with sparcity is O(m) (m >= n). Worst case scenario, O(n+m). Unless there are no gaps in the data. But the question doesn't say that. — acattle, Jun 19 '12 at 01:21
@acattle, I was pointing out the obvious, I guess. But I just can't see any other way! — Akavall, Jun 19 '12 at 01:25
I heavily suspect the prof is either trolling the students or had a few too many at lunch — acattle, Jun 19 '12 at 01:27
@ron - did your professor show the O(n) solution, so you can confirm that there is in fact a legitimate O(n) solution that satisfies the problem as presented (with no other assumptions or constraints)? — hatchet - done with SOverflow, Jun 19 '12 at 01:41
@hatchet: No , I've send him an email 15 minutes ago.He would answer shortly, I hope ,and then I'll post here his answer . — JAN, Jun 19 '12 at 01:44
@ron - before posting the solution, maybe just say whether you agree it's a legitimate O(n) solution, and then let us stew a little more on it? — hatchet - done with SOverflow, Jun 19 '12 at 01:48
If the numbers are initially unsorted, the problem reduces from sorting and the lower bound is O(n log n), assuming you use comparison and addition only (ask if you need a proof, but it's easy). — sdcvvc, Jun 19 '12 at 06:22
If the range of values is fixed then it can be done in O(n) i suppose — amitkarmakar, Jun 19 '12 at 12:20

btilly · Accepted Answer · 2012-07-01T00:25:47.113

12

This is not a full answer. But if the list was sorted, then your problem is easiy doable in O(n). To do it, arrange all of the numbers in a linked list. Maintain a pointer to a head, and somewhere in the middle. At each step, take the top two elements off of the head, print them, advance the middle pointer until it is where the sum should go, and insert the sum.

The starting pointer will move close to 2n times and the middle pointer will move about n times, with n inserts. All of those operations are O(1) so the sum total is O(n).

In general you cannot sort in time O(n), but there are a number of special cases in which you can. So in some cases it is doable.

The general case is, of course, not solvable in time O(n). Why not? Because given your output, in time O(n) you can run through the output of the program, build up the list of pairwise sums in order as you go, and filter them out of the output. What is left is the elements of the original list in sorted order. This would give a O(n) general sorting algorithm.

Update: I was asked to show how could you go from the output (10, 11), (12, 13), (14, 15), (21, 25), (29, 46) to the input list? The trick is that you always keep everything in order then you know how to look. With positive integers, the next upcoming sum to use will always be at the start of that list.

Step 0: Start
  input_list: (empty)
  upcoming sums: (empty)

Step 1: Grab output (10, 11)
  input_list: 10, 11
  upcoming_sums: 21

Step 2: Grab output (12, 13)
  input_list: 10, 11, 12, 13
  upcoming_sums: 21, 25

Step 3: Grab output (14, 15)
  input_list: 10, 11, 12, 13, 14, 15
  upcoming_sums: 21, 25, 29

Step 4: Grab output (21, 25)
  input_list: 10, 11, 12, 13, 14, 15
  upcoming_sum: 29, 46

Step 5: Grab output (29, 46)
  input_list: 10, 11, 12, 13, 14, 15
  upcoming_sum: 75

edited Jul 01 '12 at 00:25

answered Jun 19 '12 at 02:59

btilly

43,296
3
59
88

Ah, very nice! Surely this plays a part in the solution the professor has in mind. – cheeken Jun 19 '12 at 03:15
Is that possible to do something like that in Python? – Akavall Jun 19 '12 at 03:54
@Akavall it is easy to make a linked list in Python, see http://stackoverflow.com/questions/280243/python-linked-list for several examples. It is, however, going to have a bad constant, so in practice people generally don't go that way. – btilly Jun 19 '12 at 04:15
If you allow for the time required to sort, this is an alternate (and possibly, more straightforward) O(n*log(n)) solution, too. – Novak Jun 19 '12 at 05:34
I'm not sure if i agree. you can sort 99% of the numbers in O(N) using selection algorithm. So if we cab output the solution without using all the numbers then selection would be fine.. conversely If all the numbers are required then it implies x0 + x1 > xn - then radix sort is ok. – Rusty Rob Jun 19 '12 at 09:15
@robertking I do not know what you are disagreeing with. – btilly Jun 19 '12 at 14:37
I just think there is a slight chance that in the cases when it is equivalent to sorting, you can use radix sort. In the cases when it's not equivalent to sorting (e.g. when not all the original numbers are used because they are too large) - you can use selection. So I'm not convinced that it is equivalent to sorting because in the cases when it is equivalent to sorting, it could be precisely those cases when sorting is easy. (i'm probably wrong but you haven't proven to me that you are right) – Rusty Rob Jun 20 '12 at 00:59
@robertking I do not know what mental model you have that leads to that line of thought. Please be assured that it is wrong. Given the sorted list, you can generate the requested output in time `O(n)`. Conversely given the requested output, in time `O(n)`, you can run through the output, generating the list of pairwise sums that are included in the output, filter those out of the output, and emit the original list, in sorted order. Thus generating the requested output is equivalent to sorting. From either you get the other. – btilly Jun 20 '12 at 01:56
Ah sorry, I thought he wanted the first n-1 numbers, but he wanted the first n-1 pairs. (I thought not all the numbers would be included in the output but they obviously are. e.g. with n=5, list = [2,5, 11, 23, 47] , result = (2,5), (7,11), (18, 23), (41, 47) – Rusty Rob Jun 20 '12 at 03:09
1

@robertking I see what you were thinking. But now that we agree on the output, you see how easy it is to generate the sums on the fly, filter them out, and get the original list back. – btilly Jun 20 '12 at 05:02
@btilly Doesn't the filtering take `O(n)` time for each pair so that recovering the original list takes `O(n^2)`? How would you go from `(1,2), (3,5), (8,10), (11,12), (18,23)` to `1,2,5,10,11,12` in `O(n)` time? You need to keep a list of sums which has length `O(n)` and then iterate through that list for each number to do the filtering, right? – Paul Jun 30 '12 at 19:03
@btilly A better example which actually be how could you go from the output `(10, 11), (12, 13), (14, 15), (21, 25), (29, 46)` to the input list: `10, 11, 12, 13, 14, 15`. As in how do you determine that `21`, `25`, `29`, and `46` are not in the original input with `O(1)` filtering? – Paul Jun 30 '12 at 19:08
@ascii-lime I updated to handle your example. It becomes more complicated in detail if you have both positive and negative numbers, but the idea is similar - keep the upcoming sums list in order, and you know where you'll be grabbing. – btilly Jul 01 '12 at 00:26

score 2 · Answer 2 · answered Jun 19 '12 at 01:32

This isn't possible in the general case.

Your problem statement reads that you must reduce your array to a single element, performing a total of n-1 reduction operations. Therefore, the number of reduction operations performed is on the order of O(n). To achieve a overall running time of O(n), each reduction operation must run in O(1).

You have clearly defined your reduction operation:

remove the 2 minimal elements in the array and print them, then
insert the sum of those elements into the array.

If your data structure were a sorted list, it is trivial to remove two minimal elements in O(1) time (pop them off the end of the list). However, reinserting an element in O(1) is not possible (in the general case). As SteveJessop pointed out, if you could insert into a sorted list in O(1) time, the resultant operations would constitute an O(n) sorting algorithm. But there is no such known algorithm.

There are some exceptions here. If your numbers are integers, you may be able to use "radix insert" to achieve O(1) inserts. If your array of numbers are sufficiently sparse in the number line, you may be able to deduce insert points in O(1). There are numerous other exceptions, but they are all exceptions.

This answer doesn't answer your question, per se, but I believe it's relevant enough to warrant an answer.

This is really just a repost of my and others' comments. But given that there appears to be no solution, I thought it prudent to post as an answer. — cheeken, Jun 19 '12 at 01:33
But even with radix sort would it's be O(n+m) where m is the number of buckets (radices? bucket sort == radix sort, but I don't know the proper terminology), m >= n since, since you don't know which buckets have data in them (and must try them all even if they're empty). And even that assuses no negative numbers. If you need to recheck old buckets, O(n + m^2) — acattle, Jun 19 '12 at 01:51
See my answer for the note that if the array is replaced by a linked list, then the reinsert in time `O(1)` is doable. However there is still an implicit sort lurking in the background. — btilly, Jun 19 '12 at 03:01

amitkarmakar · Answer 3 · 2012-06-19T12:48:03.647

If the range of values is less than n, then this can be solved in O(n).

1> Create an array mk of size equal to range of values and initialize it to all zero

2> traverse through the array and increment value of mk at the position of the array element. i.e if the array element is a[i] then increment mk[a[i]]

3) For presenting the answers after each of the n-1 operations follow the following steps:

There are two cases:

Case 1 : all of a[i] are positive

        traverse through mk array from 0 to its size
        cnt = 0
        do this till cnt doesn't equal 2
          grab a nonzero element decrease its value by 1 and increment cnt by 1
        you can get two minimum values in this way
        present them 
        now do mk[sum of two minimum]++

Case 2 : some of a[i] is negative

        <still to update>

score 0 · Answer 4 · answered Jun 26 '12 at 03:56

0

O(nlogn) is easy - just use a heap, treap or skiplist.

O(n) sounds tough.

https://en.wikipedia.org/wiki/Heap_%28data_structure%29
https://en.wikipedia.org/wiki/Treap
https://en.wikipedia.org/wiki/Skip_list

answered Jun 26 '12 at 03:56

user1277476

2,871
12
10

It'd probably be really fast with a splay tree, but that's still probably O(nlogn). Splay trees are nice in that the most recently used nodes get moved near the top of the tree. – user1277476 Jun 26 '12 at 21:37

Extracting 2 numbers n times and placing back the addition in O(n) instead of O(n*log(n))

4 Answers4