Assume you read data items and associated scores from a "stream" source (i.e. no random access or multiple passes possible).
What is the best way of keeping, at any time, only those elements in memory with lowest weight encountered so far. I would be interested in the "Java" way of doing it, the shorter the idiom the better, rather than algorithm ("use search-tree, insert new element, delete biggest if size exceeded").
Below is the solution I came up with, however I find it a bit lengthy, also some behaviour might be unexpected (the same item with different scores is possibly kept multiple times, while the same item added with the same score is kept only once). I also feel there should be something existing for this.
import java.util.AbstractMap.SimpleEntry;
import java.util.Map.Entry;
import java.util.Comparator;
import java.util.TreeSet;
/**
* Stores the n smallest (by score) elements only.
*/
public class TopN<T extends Comparable<T>> {
private TreeSet<Entry<T, Double>> elements;
private int n;
public TopN(int n) {
this.n = n;
this.elements = new TreeSet<Entry<T, Double>>(
new Comparator<Entry<T, Double>>() {
@Override
public int compare(Entry<T, Double> o1, Entry<T, Double> o2) {
if (o1.getValue() > o2.getValue()) return 1;
if (o1.getValue() < o2.getValue()) return -1;
return o1.getKey() == null ? 1 : o1.getKey().compareTo(o2.getKey());
}
});
}
/**
* Adds the element if the score is lower than the n-th smallest score.
*/
public void add(T element, double score) {
Entry<T, Double> keyVal = new SimpleEntry<T, Double>(element,score);
elements.add(keyVal);
if (elements.size() > n) {
elements.pollLast();
}
}
/**
* Returns the elements with n smallest scores.
*/
public TreeSet<Entry<T, Double>> get() {
return elements;
}
}
There is a similar question, but it doesn't include the stream source / memory requirement: Find top N elements in an Array