I am using a priority queue act as a min heap such that I can merge lists. These lists are on separate disks and hence this style of merging should be most efficient.
When I run the following code on small datasets (-1000 -> 1000 range) they are sorted perfectly. However when I run it on a larger range the priority queue isn't correctly sorted.
This is the code that I am using to create and initially populate the Priority Queue.
RandomAccessFile sourceRandom = new RandomAccessFile(file1, "r");
RandomAccessFile destRandom = new RandomAccessFile(file2, "rw");
PriorityQueue<HeapNode> minHeap = new PriorityQueue<>(1, (o1, o2) -> o1.Value - o2.Value);
//create and populate an array of indexes for the start of each unmerged partition
long[] startIdxArray = new long[k];
for(int ki = 0; ki < k ; ki++){
startIdxArray[ki] = sortableSize * (long)ki;
}
long[] idxArray = startIdxArray.clone();
//populate the heap with the first value from the partition
for(int ki = 0; ki < k; ki++){
sourceRandom.seek(idxArray[ki]);
minHeap.add(new HeapNode(sourceRandom.readInt(), ki));
idxArray[ki] += 4;
}
HeapNode Class:
class HeapNode{
public int kIdx;
public int Value;
public HeapNode(int value, int KIdx){
kIdx = KIdx;
Value = value;
}
}
This is the output of the minHeap:
This is formatted by Intellij in the debugging pane.
Additionally this is what is actually output by the minHeap.poll()
method.
I would just like to additionally note that this code works 100% when the values that are read into the priority queue are ~[-1000 -> 1000]:
[4, 3, 2, 1, 128, 256, 1024, 65536, -128, -256, 512, -227, -1]
works correctly when split into 7 lists of two elements.
These were also stopped at the same point of execution: just after the pasted code.
I am using a minimum heap for O(log(n))
extraction of the minimum term resulting on O(n log(n))
sorting time rather than O(n^2)
Thanks for any help you can give me with this or even steps to take to try and debug this.
Notes:
This is a windows 10 PC running Win10Home,
JDK: 1.8.0_112
(this is what the folder is called in ProgramFiles
at least