I have an array of elements (in the example, these are simply integers), which are compared using some custom comparator. In this example, I simulate this comparator by defining i SMALLER j
if and only if scores[i] <= scores[j]
.
I have two approaches:
- using heap of the current k candidates
- using array of the current k candidates
I update the upper two structures in the following way:
- heap: methods
PriorityQueue.poll
andPriorityQueue.offer
, - array: index
top
of the worst among top k candidates in the array of candidates is stored. If a newly seen example is better than the element at the indextop
, the latter is replaced by the former andtop
is updated by iterating through all k elements of the array.
However, when I have tested, which of the approaches is faster, I found out that this is the second. The questions are:
- Is my use of
PriorityQueue
suboptimal? - What is the fastest way to compute k smallest elements?
I am interested in the case, when the number of examples can be large, but the number of neighbours is relatively small (between 10 and 20).
Here is the code:
public static void main(String[] args) {
long kopica, navadno, sortiranje;
int numTries = 10000;
int numExamples = 1000;
int numNeighbours = 10;
navadno = testSimple(numExamples, numNeighbours, numTries);
kopica = testHeap(numExamples, numNeighbours, numTries);
sortiranje = testSort(numExamples, numNeighbours, numTries, false);
System.out.println(String.format("tries: %d examples: %d neighbours: %d\n time heap[ms]: %d\n time simple[ms]: %d", numTries, numExamples, numNeighbours, kopica, navadno));
}
public static long testHeap(int numberExamples, int numberNeighbours, int numberTries){
Random rnd = new Random(123);
long startTime = System.currentTimeMillis();
for(int iteration = 0; iteration < numberTries; iteration++){
final double[] scores = new double[numberExamples];
for(int i = 0; i < numberExamples; i++){
scores[i] = rnd.nextDouble();
}
PriorityQueue<Integer> myHeap = new PriorityQueue(numberNeighbours, new Comparator<Integer>(){
@Override
public int compare(Integer o1, Integer o2) {
return -Double.compare(scores[o1], scores[o2]);
}
});
int top;
for(int i = 0; i < numberExamples; i++){
if(i < numberNeighbours){
myHeap.offer(i);
} else{
top = myHeap.peek();
if(scores[top] > scores[i]){
myHeap.poll();
myHeap.offer(i);
}
}
}
}
long endTime = System.currentTimeMillis();
return endTime - startTime;
}
public static long testSimple(int numberExamples, int numberNeighbours, int numberTries){
Random rnd = new Random(123);
long startTime = System.currentTimeMillis();
for(int iteration = 0; iteration < numberTries; iteration++){
final double[] scores = new double[numberExamples];
for(int i = 0; i < numberExamples; i++){
scores[i] = rnd.nextDouble();
}
int[] candidates = new int[numberNeighbours];
int top = 0;
for(int i = 0; i < numberExamples; i++){
if(i < numberNeighbours){
candidates[i] = i;
if(scores[candidates[top]] < scores[candidates[i]]) top = i;
} else{
if(scores[candidates[top]] > scores[i]){
candidates[top] = i;
top = 0;
for(int j = 1; j < numberNeighbours; j++){
if(scores[candidates[top]] < scores[candidates[j]]) top = j;
}
}
}
}
}
long endTime = System.currentTimeMillis();
return endTime - startTime;
}
This produces the following result:
tries: 10000 examples: 1000 neighbours: 10
time heap[ms]: 393
time simple[ms]: 388