53

I am calculating a large number of possible resulting combinations of an algortihm. To sort this combinations I rate them with a double value und store them in PriorityQueue. Currently, there are about 200k items in that queue which is pretty much memory intesive. Acutally, I only need lets say the best 1000 or 100 of all items in the list. So I just started to ask myself if there is a way to have a priority queue with a fixed size in Java. I should behave like this: Is the item better than one of the allready stored? If yes, insert it to the according position and throw the element with the least rating away.

Does anyone have an idea? Thanks very much again!

Marco

Marco
  • 2,005
  • 4
  • 16
  • 9
  • Possible duplicate of [Is there a PriorityQueue implementation with fixed capacity and custom comparator?](http://stackoverflow.com/questions/7878026/is-there-a-priorityqueue-implementation-with-fixed-capacity-and-custom-comparato) – Raedwald Apr 18 '16 at 07:11
  • 1
    @Raedwald: Uhm, this question was asked almost two years _before_ the one you say it's a duplicate. Think you might have it the round way wrong. ;-) – Amos M. Carpenter Apr 18 '16 at 07:32

7 Answers7

45
que.add(d);
if (que.size() > YOUR_LIMIT)
     que.poll();

or did I missunderstand your question?

edit: forgot to mention that for this to work you probably have to invert your comparTo function since it will throw away the one with highest priority each cycle. (if a is "better" b compare (a, b) should return a positvie number.

example to keep the biggest numbers use something like this:

public int compare(Double first, Double second) {
            // keep the biggest values
            return first > second ? 1 : -1;
        }
getekha
  • 2,495
  • 3
  • 18
  • 20
  • 1
    Good answer but I would prefer a reverse of this: if (que.size() >= YOUR_LIMIT) que.poll(); que.add(d); By doing this, the java priorityqueue would not resize the array if we fix the YOUR_LIMIT as the size of the heap – Ankit Bhatnagar Oct 13 '17 at 17:35
  • 2
    @AnkitBhatnagar, that won't work. That will unconditionally remove the old head. getakha's answer removes whichever head is worse. – Jetpack Mar 18 '19 at 22:18
  • It won't work since one may intend to keep the biggest value one the top of the heap(head), avaible to poll() in constant time, while evicting the tail if the max size is met. Guava's MinMaxPriorityQueue can achieve this, since it allows to you reach head and tail of the queue in constant time. – AwesomeHunter Dec 08 '20 at 10:50
17

MinMaxPriorityQueue, Google Guava

There is indeed a class for maintaining a queue that, when adding an item that would exceed the maximum size of the collection, compares the items to find an item to delete and thereby create room: MinMaxPriorityQueue found in Google Guava as of version 8.

EvictingQueue

By the way, if you merely want deleting the oldest element without doing any comparison of the objects’ values, Google Guava 15 gained the EvictingQueue class.

i0x539
  • 4,763
  • 2
  • 20
  • 28
Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • 3
    It almost seem like Guava discourages the use of MinMaxPriorityQueue if you only look at one edge of the queue. See the "Performance Notes": https://google.github.io/guava/releases/snapshot/api/docs/com/google/common/collect/MinMaxPriorityQueue.html – Tarrasch Sep 16 '16 at 04:31
  • @Tarrasch I am no expert on the topic, but my reading of those notes says that (a) if you want automatic eviction rather than manual, and (b) you are setting a maximum size, then this class is appropriate. – Basil Bourque Aug 22 '22 at 07:03
5

There is a fixed size priority queue in Apache Lucene: http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/util/PriorityQueue.html

It has excellent performance based on my tests.

2

Use SortedSet:

SortedSet<Item> items = new TreeSet<Item>(new Comparator<Item>(...));
...
void addItem(Item newItem) {
    if (items.size() > 100) {
         Item lowest = items.first();
         if (newItem.greaterThan(lowest)) {
             items.remove(lowest);
         }
    }

    items.add(newItem);   
}
Victor Sorokin
  • 11,878
  • 2
  • 35
  • 51
2

Just poll() the queue if its least element is less than (in your case, has worse rating than) the current element.

static <V extends Comparable<? super V>> 
PriorityQueue<V> nbest(int n, Iterable<V> valueGenerator) {
    PriorityQueue<V> values = new PriorityQueue<V>();
    for (V value : valueGenerator) {
        if (values.size() == n && value.compareTo(values.peek()) > 0)
            values.poll(); // remove least element, current is better
        if (values.size() < n) // we removed one or haven't filled up, so add
            values.add(value);
    }
    return values;
}

This assumes that you have some sort of combination class that implements Comparable that compares combinations on their rating.

Edit: Just to clarify, the Iterable in my example doesn't need to be pre-populated. For example, here's an Iterable<Integer> that will give you all natural numbers an int can represent:

Iterable<Integer> naturals = new Iterable<Integer>() {
    public Iterator<Integer> iterator() {
        return new Iterator<Integer>() {
            int current = 0;
            @Override
            public boolean hasNext() {
                return current >= 0;
            }
            @Override
            public Integer next() {
                return current++;
            }
            @Override
            public void remove() {
                throw new UnsupportedOperationException();
            }
        };
    }
};

Memory consumption is very modest, as you can see - for over 2 billion values, you need two objects (the Iterable and the Iterator) plus one int.

You can of course rather easily adapt my code so it doesn't use an Iterable - I just used it because it's an elegant way to represent a sequence (also, I've been doing too much Python and C# ☺).

gustafc
  • 28,465
  • 7
  • 73
  • 99
  • Does this assume you have all the items in `valueGenerator` already? – vahidg Dec 04 '09 at 12:40
  • I think one of the goals of the OP is to avoid accumulating so many items in an `Iterable` in the first place. Furthermore, if the higher the ranking the better the algorithm, then `peek` is not what you want. – vahidg Dec 04 '09 at 12:42
  • No, you don't need to have them all available. An iterator can generate values on the fly in its `next()` method. – gustafc Dec 04 '09 at 13:06
  • And why wouldn't `peek()` do the trick? It returns the least element, and if the current element is better than the least element, I throw the least element away and add the current. I tested the code, it works. – gustafc Dec 04 '09 at 13:10
  • If you doubt, just try the code - it works. Quoting the JavaDocs: "The head of this queue is the *least* element with respect to the specified ordering. [...] The queue retrieval operations `poll`, `remove`, `peek`, and `element` access the element at the head of the queue." As I say in the post: I assume that whatever is used to represent a combination implements `Comparable` in a way that considers a lower-rated combination as "less than" a better-rated one. If it doesn't and can't, I leave it as an excercise for the reader to modify my example so that it uses a custom comparator. – gustafc Dec 04 '09 at 13:58
  • 1
    Yep u're right, the head is indeed the least element. For some reason I thought it was the other way around. – vahidg Dec 04 '09 at 14:10
0

A better approach would be to more tightly moderate what goes on the queue, removing and appending to it as the program runs. It sounds like there would be some room to exclude some the items before you add them on the queue. It would be simpler than reinventing the wheel so to speak.

Gordon
  • 4,823
  • 4
  • 38
  • 55
-1

It seems natural to just keep the top 1000 each time you add an item, but the PriorityQueue doesn't offer anything to achieve that gracefully. Maybe you can, instead of using a PriorityQueue, do something like this in a method:

List<Double> list = new ArrayList<Double>();
...
list.add(newOutput);
Collections.sort(list);
list = list.subList(0, 1000);
vahidg
  • 3,953
  • 2
  • 22
  • 30
  • 1
    also using a TreeMap, you have the highest value readily available and you can avoid insertions altogether if the current result is greater than that, removing the last key and inserting the new value otherwise – Lorenzo Boccaccia Dec 04 '09 at 12:03
  • 1
    @Lorenzo, `Map` isn't good as it will not allow two combinations having the same rating. – gustafc Dec 04 '09 at 12:27
  • 4
    this approach does not have the performance benefits of black red tree implementation and performance killer – nimcap Mar 17 '10 at 15:09
  • This will have vey bad performance as you sort the array each time. Adding to a heap will be much faster. – ThatDataGuy Apr 26 '20 at 20:39
  • This will reduce the performance drastically since we are sorting the list everytime we add an element. – Pramod Oct 16 '21 at 15:28