During the last months I created in Java some classes implementing data structures, more specifically Lists, Binary Search Trees and Binary Heaps. I decided to make a stress test by creating an Integer
array of n
values between 0
and 10*n
, then sorting in various ways and measuring the times.
Initially it was just a curiosity. Obviously I expected my classes to costs far more than the normal Arrays.sort()
method. However when I did the tests and compared my classes against each other I found some unexpected surprises.
This is the list of the tests, with details and comments.
1. A copy of the array is created, then the copy is sorted using the Arrays.sort()
method, native of Java.
The time evaluated is that of the Arrays.sort()
method, i.e. the creation of the copy of the array doesn't count.
It is the fastest way, as expected.
2. A list is created from the array, then the list is sorted using the Insertion Sort algorithm.
The time evaluated is that of the sorting method, i.e. the creation of the list from the array doesn't count.
Since the performance of Insertion Sort is not exactly great, this method costs around 50 times the array method.
3. A Binary Search Trees (BST
from now on) is created from the array repeating the add()
method.
The BST
is not a balanced tree like the AVL or the Red-Black, merely a normal BST
like found on Wikipedia: every node has links to three other nodes (parent
, leftChild
and rightChild
), incapsulates a value
, etc...
This method costs arount 500 times the list method, i.e. 25 000 times the array method.
4-5. Two Binary Heap (BH_1
and BH_2
from now on) are created from the array repeating the add()
method, then they are converted into two (sorted) array repeating the extractMin()
method. Both BH
belongs to the same class and stores the value in a Vector<Integer>
. Both BH
costs around 2 times the BST and 50 000 times the array method. However there is a twist.
BH_2
creates the array using the method convertHeapToArray()
of the Heap<Integer>
interface. convertHeapToArray()
calls the method extractMin()
for n
times, while extractMin()
in turn calls the method heapify()
one time.
This doesn't happen in BH_1
, which uses the method convertHeapToArray_1()
. Instead of calling extractMin()
, my "new" method directly executes the code of extractMin()
- and when extractMin()
calls the method heapify()
, BH_1
instead executed its code. In short, a copy-paste that avoids a few calls.
In theory BH_1
should always costs less than BH_2
: same input data, same code, less method calling. However this is true only 73% of the times!
My questions are the following:
1. Why the Binary Search Trees sorting method (computational complexity n log(n)
, expected to be balanced if created by add()
only) costs 500 times an Insertion Sort (computational complexity n
2, is preferable if n
is less than 23)
2. Why the Binary Heaps (same computational complexity of Binary Search Trees, specifically designed to sort quickly) costs 2 times a Binary Search Trees?
3. And, more baffling than ever, why making less calls is more costly than making more calls in one case out of 4?!
Code for convertHeapToArray()
:
public default T[] convertHeapToArray(){
T[] output = AArrays.createTarray(length(), min());
for(int i=0 ; i<length() ; i++)
output[i] = this.extractMin();
return output;
}
public T extractMin() {
T min = storer.get(indexOfMinimum);
AVector.swap(storer, indexOfMinimum, length);
AVector.swap(storer, indexOfMinimum, length);
length--;
heapify(indexOfMinimum);
return min;
}
Report (5000 tests, 100 random arrays each):
The array use a Comparator<Integer>.
A Comparator<Integer> executes a confront in 66083 nanoseconds.
The list use a Comparator<NodeList<Integer>>.
A Comparator<NodeList<Integer>> executes a confront in 85973 nanoseconds.
The BST, BH_1 and BH_2 use a Relationship<Integer>.
A Relationship<Integer> executes a confront in 107145 nanoseconds.
The total time for the array sorting is 239 717 392 nanoseconds.
The total time for the list sorting is 984 872 184 nanoseconds.
The total time for the BST sorting is 533 338 808 811 nanoseconds.
The total time for the BH_1 sorting is 1 055 836 440 689 nanoseconds.
The total time for the BH_2 sorting is 1 198 365 741 676 nanoseconds.
The medium time for the array sorting is 47 943 nanoseconds.
The medium time for the list sorting is 196 974 nanoseconds.
The medium time for the BST sorting is 106 667 761 nanoseconds.
The medium time for the BH_1 sorting is 211 167 288 nanoseconds.
The medium time for the BH_2 sorting is 239 673 148 nanoseconds.
The first method for the Binary Heap has been faster than the second for 3 634 times out of 5 000.
EDIT:
Re-reading what I wrote, I realized I wasn't exactly clear in my initial question. Allow me to rectify my mistake.
I am aware that there is a difference between the actual time executed by the program and the computational complexity. I never had any doubt on the computational complexity of the methods used: the data structures are simple, their code is pratically taken from Wikipedia . I was certain that the written code was not well performing. It wasn't written with performance to begin with.
My test is on the actual time of execution. The Arrays.sort()
method was included as a reference parameter. Since I didn't wrote the code with performance in mind, I suspected before the test that the results would be more costly than expected. However my prediction on what exactly was costing more than it should were thrown into the garbage bin by the results.
For example I believed that less method calls would results in a lesser cost, but I was wrong. The whole difference between the two Binary Heaps was that the first executed the same code of the second, just with less method calls (minimal code for the BinaryHeap included below). I expected that the first Binary Heap would always have a lesser cost: this was proven wrong, and I don't know why.
Another thing that baffled me was the fact that the Binary Heap costed more than the Binary Search Tree
. The array used is created randomly. While with this starting condition the height of a Binary Search Tree is expected to be around log(n)
(chapter 12, paragraph 4 of Introduction to Algorithm, by Cormen, Leiserson, Rivest, Stein), I never heard of an algorithm using Binary Search Trees to sort arrays: I included it in my test only as a curiosity. And yet for the small number of elements (initially 100) the Binary Search Trees was found consistently quicker than the Binary Heap.
Why this happens? When a Binary Heap starts being more convenient? Was including the conversion Heap»Array a mistake?
A third unexpected result is the performance of the double-linked List
. As with the Binary Search Trees, I included in the test as a curiosity. The sorting algorithm is a very basic Insertion Sort, which is faster only for a number of element far less than what I used, and the code of the entire class is most definitively not created for quickness. I supposed it would have revealed as the most slow of my classes, instead it was the quickest! And I still don't know why.
This is why I asked advice. I didn't included the code because between all the classes it is around the ~3000 lines of code, most of which is unused by the test. I would not have read 3k lines of code, nor I expected the random stackoverflow-er to do so! However I included the code for the BinaryHeap<T>
, which is ~300 lines.
Code for
BinaryHeap<T>
:
public class BinaryHeap<T> implements Heap<T> {
private static final int indexOfMinimum = 1;
private Vector<T> storer = new Vector<T>(indexOfMinimum + 10);
private Relationship<T> relationship = null;
// The class Relationship<T> has a single method whose signature is
// public boolean confront(T a, T b);
// The reason I included it instead of a Comparator is that, in my code, the value null represents -∞
//
// The following code works.
//
// public class Relationship<T>{
// Comparator<T> c;
// public Relationship<T>(Comparator<T> c){ this.c = c; }
// public boolean confront(T a, T b){ return a==null ? true : c.compare(a,b); }
// }
// TODO | Constructors
public BinaryHeap(Relationship<T> relationship){
storer.add(null);
this.relationship = relationship;
}
// TODO | Methods of the interface Heap
public void add(T newData) {
storer.add(null);
updateValue(storer.size() - 1, newData);
}
public T extractMin() {
T min = storer.get(indexOfMinimum);
heapify(indexOfMinimum);
return min;
}
public void updateValue(int indexOfToBeUpgraded, T newValue) {
int i = indexOfToBeUpgraded;
T support;
if( i >= indexOfMinimum )
{
storer.set(i, newValue);
while( i > indexOfMinimum && ! relationship.confront(storer.get(i/2), storer.get(i)) )
{
support = storer.get(i);
storer.set(i, storer.get(i/2));
storer.set(i, support);
i = i/2;
}
}
}
private void heapify(int i){
int j = i;
int maximumIndexOfArray = storer.size();
while( j <= maximumIndexOfArray/2 ) // i.e. if H[j] isn't a leaf of the Tree
{
int indexOfLocalMinimum = relationship.confront(storer.get(j), storer.get(2*j))
? ( relationship.confront(storer.get( j), storer.get(2*j+1)) ? j : 2*j+1 )
: ( relationship.confront(storer.get(2*j), storer.get(2*j+1)) ? 2*j : 2*j+1 ) ;
if( j != indexOfLocalMinimum )
{
AVector.swap(storer, j, indexOfLocalMinimum);
j = indexOfLocalMinimum;
}
else j = maximumIndexOfArray;
}
}
public default T[] convertHeapToArray(){
T[] output = (T[]) Array.newInstance(min().getClass(), length());
for(int i=0 ; i<length() ; i++)
output[i] = this.extractMin();
return output;
}
// TODO | Second version of the method convertHeapToArray, found out not to be improved
public T[] convertHeapToArray_1(){
int length = length(), j;
T[] output = (T[]) Array.newInstance(min().getClass(), length());
for(int i=0 ; i<length ; i++)
{
// output[i] = this.extractMin();
output[i] = storer.get(indexOfMinimum);
// heapify(indexOfMinimum);
j = indexOfMinimum;
int maximumIndexOfArray = storer.size();
while( j <= maximumIndexOfArray/2 ) // i.e. if H[j] isn't a leaf of the Tree
{
int indexOfLocalMinimum = relationship.confront(storer.get(j), storer.get(2*j))
? ( relationship.confront(storer.get( j), storer.get(2*j+1)) ? j : 2*j+1 )
: ( relationship.confront(storer.get(2*j), storer.get(2*j+1)) ? 2*j : 2*j+1 ) ;
if( j != indexOfLocalMinimum )
{
AVector.swap(storer, j, indexOfLocalMinimum);
j = indexOfLocalMinimum;
}
else j = maximumIndexOfArray;
}
}
return output;
}
I have edited my first post including: an answer to all previous comments, a reply form @MikeNakis, a _far_ better explanation of my question. Again, my apologise. – Asghabard Apr 16 '17 at 19:47