I've being studying the different possibilities to add values in a heap in Java. I'm using the PriorityHeap
class. When I noticed slow running times in my application, I decided to give this a look. I'm adding several thousand and, sometimes, millions of custom entries (I have a custom class which has 3 fields: a int, a LongWritable and Text, both from hadoop.io; this instrumentation agent says my records have 200 bytes in average).
Is it obvious that using addAll()
instead of add()
method to put entries in the heap will improve performance, simply because this would avoid several heapify
operations?
I tried with different strategies using the following new example:
package Sorting;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.PriorityQueue;
public class Main {
private static final int HEAP_SIZE = 1000000;
private static final int BULK_LIST_SIZE = HEAP_SIZE / 10;
private static String normal;
private static String bulk;
private static String fullBulk;
public static void main(String[] args) throws IOException {
normal = "";
bulk = "";
fullBulk = "";
long time = 0;
warmup();
normal = "";
bulk = "";
fullBulk = "";
for (int i = 0; i < 100; i++) {
// Normal add time
System.out.println("Normal starts...");
time = normalExecution();
System.out.println("Normal add time " + time);
// Bulk add time
System.out.println("Bulk starts...");
time = bulk();
System.out.println("Bulk add time " + time);
// Bulk add time with list and heap with same size
System.out.println("Full Bulk starts...");
time = fullBulk();
System.out.println("Full Bulk add time " + time);
}
System.out.println(normal);
System.out.println(bulk);
System.out.println(fullBulk);
}
private static long fullBulk() {
long time;
long start;
List<Double> fullBulkList = new ArrayList<Double>(HEAP_SIZE);
PriorityQueue<Double> fullBulkHeap = new PriorityQueue<Double>(HEAP_SIZE);
start = System.nanoTime();
for (int j = 0; j < HEAP_SIZE; j++) {
if (fullBulkList.size() == HEAP_SIZE) {
fullBulkHeap.addAll(fullBulkList);
fullBulkList.clear();
}
}
fullBulkHeap.addAll(fullBulkList);
time = System.nanoTime() - start;
fullBulk = fullBulk + "\t" + time;
fullBulkList = null;
fullBulkHeap = null;
return time;
}
private static long bulk() {
long time;
long start;
List<Double> bulkList = new ArrayList<Double>(BULK_LIST_SIZE);
PriorityQueue<Double> bulkHeap = new PriorityQueue<Double>(HEAP_SIZE);
start = System.nanoTime();
for (int j = 0; j < HEAP_SIZE; j++) {
if (bulkList.size() == BULK_LIST_SIZE) {
bulkHeap.addAll(bulkList);
bulkList.clear();
}
}
bulkHeap.addAll(bulkList);
time = System.nanoTime() - start;
bulk = bulk + "\t" + time;
bulkList = null;
bulkHeap = null;
return time;
}
private static long normalExecution() {
long time;
long start;
PriorityQueue<Double> normalHeap = new PriorityQueue<Double>(HEAP_SIZE);
start = System.nanoTime();
for (int j = 0; j < HEAP_SIZE; j++) {
normalHeap.add(Double.MAX_VALUE);
}
time = System.nanoTime() - start;
normal = normal + "\t" + time;
normalHeap = null;
return time;
}
private static void warmup() {
System.out.println("Starting warmup");
for (int i = 0; i < 1000; i++) {
normalExecution();
bulk();
fullBulk();
}
for (int i = 0; i < 1000; i++) {
bulk();
fullBulk();
normalExecution();
}
for (int i = 0; i < 1000; i++) {
fullBulk();
normalExecution();
bulk();
}
System.out.println("Warmup finished");
}
}
Which resulted in the following results:
The huge peak in the 11th iteration of the normal add method is explained by a GC call: [GC 1347684K->31354K(1446400K), 0.0331610 secs]
.
Mediam values are 16049669, 783724 and 800276 respectively. ST dev are 3512492.89, 244374.17 and 33344.17.