1

The question is pretty much in the title, but say I have a list L

L = [1,2,3,4,5]

min(L) = 1 here. Now I remove 4. The min is still 1. Then I remove 2. The min is still 1. Then I remove 1. The min is now 3. Then I remove 3. The min is now 5, and so on.

I am wondering if there is a good way to keep track of the min of the list at all times without needing to do min(L) or scanning through the entire list, etc.

There is an efficiency cost to actually removing the items from the list because it has to move everything else over. Re-sorting the list each time is expensive, too. Is there a way around this?

John Smith
  • 11,678
  • 17
  • 46
  • 51
  • If the list is sorted as above than just `list[0]`? – daniel gratzer Jul 22 '13 at 16:35
  • Keep a copy of the sorted list and then alist[0]? – zhangyangyu Jul 22 '13 at 16:36
  • There is a cost to removing elements from a list since it has to move everything else over. I am wondering if somehow there's a better way to do things using a static array perhaps. – John Smith Jul 22 '13 at 16:36
  • @JohnSmith: What are you trying to do here then? Are you looking for a [heap queue](http://docs.python.org/2/library/heapq.html) perhaps? – Martijn Pieters Jul 22 '13 at 16:39
  • @MartijnPieters I don't think so -- I'm trying to keep track of min(L) at every step after a particular element is to be removed from consideration. But it's costly to do min(L) or re-sort every single step for large lists. I think it's also costly to remove items one-by-one until the list is empty. – John Smith Jul 22 '13 at 16:40
  • 1
    Are you allowed to have an auxiliary data structure? Or are you attempting to do this in place? Does it have to be a list or can it be something else? – Brent Writes Code Jul 22 '13 at 16:42
  • Sounds like you want to study *some* form of tree at the very least then. – Martijn Pieters Jul 22 '13 at 16:42
  • @BrentNash Yes, a list (I assume), and I assume in-place. I'd have to see the other methods to tell for sure. – John Smith Jul 22 '13 at 16:43
  • 2
    @JohnSmith I think you're screwed there. If there is no ordering in the list, and you're not keeping an ordered version of the list elsewhere, then this is *O(n)* per update. With an auxiliary list this'd be *O(1)* for a deletion, *O(n)* for insertion, with an auxiliary heap *O(log n)* for both. – millimoose Jul 22 '13 at 16:44
  • How large can be the items of L? – Fallen Jul 22 '13 at 17:00
  • @JohnSmith My reasoning there is: when you remove the minimum from a list, you **need** to know, beforehand, what was the second-smallest element to "efficiently" (in sub-linear time) find out the new minimum. (Ignoring the possibility of a binary search over an ordered list which you've already excluded.) – millimoose Jul 22 '13 at 17:09
  • That said, there's the option of just keeping the minimum explicitly (`min_l`), as well as the number of times the minimum is in the list (`min_count`), and annotating the items of the list with a boolean flag (`is_min`) that determines whether they're the minimum. Assuming removals of the minimum are rare, you wouldn't need to do anything when a non-minimal element is removed, when a duplicate minimum is removed you only need to decrement (`min_count`), and only when the last minimal element is removed, do you need to search for and count the occurences of the new minimum. – millimoose Jul 22 '13 at 17:14
  • @millimoose That is very similar to what I am trying now -- it seems to be the best bet – John Smith Jul 22 '13 at 17:15
  • Depending on your use case this might be easier and/or faster than keeping a separate data structure, but it's not perfectly "in-place". (Instead of a list of keys, you'd need to store a list of `(key, is_min)` tuples.) That said, I would strongly consider just using a parallel list and a min-heap since both are readily available data structures, or at least benchmarking the two with something approximating your usage patterns. – millimoose Jul 22 '13 at 17:16

5 Answers5

6

To remove a random element you need to know what elements have not been removed yet.

To know the minimum element, you need to sort or scan the items.

A min heap implemented as an array neatly solves both problems. The cost to remove an item is O(log N) and the cost to find the min is O(1). The items are stored contiguously in an array, so choosing one at random is very easy, O(1).

The min heap is described on this Wikipedia page

BTW, if the data are large, you can leave them in place and store pointers or indexes in the min heap and adjust the comparison operator accordingly.

Doug Currie
  • 40,708
  • 1
  • 95
  • 119
  • A min-heap inherently does exactly this. – Manny D Jul 22 '13 at 16:51
  • Incorrect. The cost to remove the *smallest* item is O(lg n). The cost to remove an arbitrary item is O(n). – chepner Jul 22 '13 at 16:57
  • You can delete an element by replacing it with it's smallest child, and then recursing. It is O(log N). I.e., every element in a heap can be viewed as the root of a sub-min-heap. See: http://cs.stackexchange.com/questions/6990/deletion-in-min-max-heaps – Doug Currie Jul 22 '13 at 17:03
  • 1
    But it takes O(n) to even *find* the item before you can replace it with anything. Starting from the root, you don't know which subheap it belongs to, so you essentially have to walk the entire data structure to find it. – chepner Jul 22 '13 at 17:08
  • 3
    You don't need to find it; we're selecting it randomly by position in the min-heap array. – Doug Currie Jul 22 '13 at 17:10
  • Ah, ok, I kept missing the "random" aspect of choosing the item to remove. One small modification, though. You need to replace the item with its *largest* child in order to let it fall off the heap. However, you can simply find the largest item in the list in O(n) while building the heap, then use that for each removal. – chepner Jul 22 '13 at 17:21
  • Removing a random element from a binary heap is actually O(1) in expectation. – David Eisenstat Jul 22 '13 at 18:18
  • @David Eisenstat, as the link I gave in the comment explains (and I got wrong in an earlier part of the comment) a delete can be done in O(log N) using decrease-key followed by delete-min. – Doug Currie Jul 22 '13 at 20:04
  • @DougCurrie It's faster just to downheap the removed element. – David Eisenstat Jul 22 '13 at 20:09
  • @David Eisenstat, yes downheap is faster, but it is possible that the heap will become imbalanced as a result, so you lose O(log N) for later operations. (assuming by downheap you mean what I suggested earlier, swap with min child, recurse) – Doug Currie Jul 22 '13 at 20:14
  • You're right. This isn't a great approach unless worst-case complexity is important. – David Eisenstat Jul 22 '13 at 20:25
2

Google for self-balancing binary search trees. Building one from the initial list takes O(n lg n) time, and finding and removing an arbitrary item will take O(lg n) (instead of O(n) for finding/removing from a simple list). A smallest item will always appear in the root of the tree.

This question may be useful. It provides links to several implementation of various balanced binary search trees. The advice to use a hash table does not apply well to your case, since it does not address maintaining a minimum item.

Community
  • 1
  • 1
chepner
  • 497,756
  • 71
  • 530
  • 681
1

Here's a solution that need O(N lg N) preprocessing time + O(lg N) update time and O(lg(n)*lg(n)) delete time.

Preprocessing:

step 1: sort the L

step 2: for each item L[i], map L[i]->i

step 3: Build a Binary Indexed Tree or segment tree where for every 1<=i<=length of L, BIT[i]=1 and keep the sum of the ranges.

Query type delete:

Step 1: if an item x is said to be removed, with a binary search on array L (where L is sorted) or from the mapping find its index. set BIT[index[x]] = 0 and update all the ranges. Runtime: O(lg N)

Query type findMin:

Step 1: do a binary search over array L. for every mid, find the sum on BIT from 1-mid. if BIT[mid]>0 then we know some value<=mid is still alive. So we set hi=mid-1. otherwise we set low=mid+1. Runtime: O(lg**2N)

Same can be done with Segment tree.

Edit: If I'm not wrong per query can be processed in O(1) with Linked List

Fallen
  • 4,435
  • 2
  • 26
  • 46
0

If sorting isn't in your best interest, I would suggest only do comparisons where you need to do them. If you remove elements that are not the old minimum, and you aren't inserting any new elements, there isn't a re-scan necessary for a minimum value.

Can you give us some more information about the processing going on that you are trying to do?

Comment answer: You don't have to compute min(L). Just keep track of its index and then only re-run the scan for min(L) when you remove at(or below) the old index (and make sure you track it accordingly).

jdero
  • 1,797
  • 3
  • 21
  • 36
  • My list is large and I am "removing" elements from consideration until the list is empty, but I need to keep track of the min at each step. However, it's costly to do min(L) or sort the list every single time. – John Smith Jul 22 '13 at 16:39
  • @JohnSmith Is running it every time the min is removed, as I indicated in my response, too costly? – jdero Jul 22 '13 at 16:41
  • You cannot keep the index to the minimum, as you'll have to take into account elements removed of elements with a lower index than the current minimum. – Martijn Pieters Jul 22 '13 at 16:41
  • Well, if that's the case, couldn't he just use the current index and reset (Edit: not necessarily "reset" but more, "monitor") the index based on the number of indexes to the left or to the right, and *keep track* of the index even if it moves? – jdero Jul 22 '13 at 16:44
  • Keeping the index updated is trivial, though. If you remove an item to the left of the index, it moves one step to the left. If you remove an item to the right, you don't have to do anything. – Fredrik Jul 22 '13 at 16:45
  • Yeah. I see what you're getting at, that you'll be doing a max of n comparisons anyway... – jdero Jul 22 '13 at 16:47
0

Your current approach of rescanning when the minimum is removed is O(1)-time in expectation for each removal (assuming every item is equally likely to be removed).

Given a list of n items, a rescan is necessary with probability 1/n, so the expected work at each step is n * 1/n = O(1).

David Eisenstat
  • 64,237
  • 7
  • 60
  • 120