1

We are doing some empirical testing of Java data structures and have got some results that we can't explain properly.

For example when we are testing TreeSet's last-method which time requirement should be constant we get a bump in running time after the size of TreeSet exceeds 30 000. We run the last-method with increasing number of elements in treeSet 1000 times for every size and then taken the median value of the results.

The relevant code is:

import java.io.IOException;
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import java.util.ArrayList;
import java.util.Collections;
import jxl.write.WriteException;

public class TestRunner {


public void test(Testable testeCase, String outputFileName, Integer... initArgs){
    ExcelWriter excel = null;
    try {
        excel = new ExcelWriter(outputFileName);
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean();
    int measurementsPoints = 35;
    //calculate median to every dataset size

    for(int j = 0; j < measurementsPoints; j++){
        int testCount = 1000;
        long startTime; 
        long endTime; 
        //double sum = 0;
        ArrayList<Integer> results = new ArrayList<Integer>();


        for (int i = 0; i < testCount; i++) {

            //initialize tested data structure
            testeCase.initTestRun(initArgs);

            startTime = threadMxBean.getCurrentThreadCpuTime();
            // run tested method
            testeCase.runTestMethod();
            endTime = threadMxBean.getCurrentThreadCpuTime();

            results.add((int)(endTime - startTime));

        }
        Collections.sort(results);
        excel.addNumber(j, 5, new Double(initArgs[0]));
        excel.addNumber(j, 6, new Double(results.get(testCount / 2)));

        //increase the size of the data structure 10, 15, 20, 30, 40, 60, 80...
        if(j % 2 == 0){
            initArgs[0]  = (int)(initArgs[0]  * 1.5);
        }
        else{
            initArgs[0] = (int)(initArgs[0]  / 3 * 4);
        }


    }
    try {
        excel.write();
    } catch (WriteException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}

import java.util.TreeSet;


public class TreeSetLastTest implements Testable {

private TreeSet<Integer> values;



@Override
public void initTestRun(Integer... integers) {
    Integer initialCapacity = integers[0];
    values = new TreeSet<Integer>();
    for(int i = Integer.MIN_VALUE; i < Integer.MIN_VALUE + initialCapacity; i++){
        values.add(i);
    }


}

@Override
public void runTestMethod() {
    values.last();
}

}

When the number of elements in the treeSet is between 10-30 000 elements the measured median is 3000 ns. When the size of the treeSet increases to 120 000 elements increases the measured median to 13 000 ns and then stays there when the number of elements increase over one million. So what could be the reason that causes the increase or is the time unit just so small that the differences are meaningless in reality. Thanks for the help.

corsiKa
  • 81,495
  • 25
  • 153
  • 204
Juho Salmio
  • 571
  • 4
  • 5
  • What memory does your computer have? Does swap memory come into play? Does Java's memory management start to use a different strategy? All of these could explain it. – Marcin May 08 '12 at 10:00
  • It does not answer your question but you should read about [how to write a micro benchmark](http://stackoverflow.com/questions/504103/how-do-i-write-a-correct-micro-benchmark-in-java). – assylias May 08 '12 at 10:04
  • 2
    Where do you get the idea that `last` in a TreeSet should be constant? A quick look in the source code shows that assumption erroneous. A TreeSet uses a TreeMap internally, which itself is a red-black tree implementation. Hence we have `O(log n)` for the get. – Voo May 08 '12 at 10:04

1 Answers1

4

Well I assume it's worth an answer.

Your assumption that a TreeSet has O(1) last() is erroneous. First the documentation doesn't state anything of this kind and in fact a TreeSet in java is implemented using a TreeMapwhich is an implementation of a red-black tree.

A red-black tree is similar to an AVL-tree which may be better known in that it guarantees O(log n) for lookups, i.e. makes sure the tree doesn't degenerate into a linked list. Basically your last() lookup has O(log n) complexity so will get worse as it gets larger.

Presumably because of caching, maybe even paging effects you don't see the O(log n) in your benchmark directly.

That's similar to LinkedLists and arrays - in theory linked lists have lots of things going for them, in practice linked lists are one of the worst data structures you can use on modern CPUs. Constant factors do matter after all and memory access patterns are large constant factors.

Voo
  • 29,040
  • 11
  • 82
  • 156