2

I'm working on an empirical analysis of merge sort (sorting strings) for school, and I've run into a strange phenomenon that I can't explain or find an explanation of. When I run my code, I capture the running time using the built in system.nanotime() method, and for some reason at a certain input size, it actually takes less time to execute the sort routine than with a smaller input size.

My algorithm is just a basic merge sort, and my test code is simple too:

//Get current system time
long start = System.nanoTime();
//Perform mergesort procedure
a = q.sort(a);
//Calculate total elapsed sort time
long time = System.nanoTime()-start;

The output I got for elapsed time when sorting 900 strings was: 3928492ns For 1300 strings it was: 3541923ns

With both of those being the average of about 20 trials, so it's pretty consistent. After 1300 strings, the execution time continues to grow as expected. I'm thinking there might be some peak input size where this phenomenon is most noticeable.

So my Question: What might be causing this sudden increase in speed of the program? I was thinking there might be some sort of optimization going on with arrays holding larger amounts of data, although 1300 items in an array is hardly large.

Some info:

  • Compiler: Java version 1.7.0_07
  • Algorithm: Basic recursive merge sort (using arrays)
  • Input type: Strings 6-10 characters long, shuffled (random order)

Am I missing anything?

Swquenzer
  • 35
  • 6
  • Did you sort the larger after sorting the smaller or in two seperate runs of the program? – arynaq Oct 25 '13 at 14:34
  • 2
    *Am I missing anything?* yes this is not a real micro benchmark. Please follow the rules stated here: [How do I write a correct micro-benchmark in Java?](http://stackoverflow.com/q/504103/1065197). Note that after some iterations of your method, the JIT will trigger and the performance of your code would be optimized, thus your code getting faster even when processing larger data. – Luiggi Mendoza Oct 25 '13 at 14:34
  • 2
    By the way, the *compiler* should be the JDK used. Eclipse is just an IDE and can work with different JDK versions. – Luiggi Mendoza Oct 25 '13 at 14:35
  • do you store String in an array or some List object ? – Sumit Gupta Oct 25 '13 at 14:35
  • Read through the link that @LuiggiMendoza posted. Also, 900->1300 Strings is really not a big change. – telkins Oct 25 '13 at 14:37
  • Thanks for the replies! As you can tell I'm new at this. Luiggi, I think that pretty much answers my question. @arynaq I ran the programs separately with multiple trials each. trevor-e: Yeah, I know the change isn't much, but the execution time is a noticeable difference in the larger graph of multiple inputs. – Swquenzer Oct 25 '13 at 14:39
  • I'd also wager to say that sorting very small inputs is not a good candidate for *micro*benchmarking. (When `nanotime()` makes sense.) Sorting algorithms are somewhat complex, so there's a lot of places where noise can creep in. To get sane results, you want the complexity of the algorithm to dominate over, say, JIT inlining `String.charAt()` or not. – millimoose Oct 25 '13 at 14:41
  • @LuiggiMendoza Eclipse has its own Java compiler that is not the same as the normal (Oracle) Java compiler – Mark Rotteveel Oct 25 '13 at 15:05
  • @MarkRotteveel you can configure which JDK to use for development. That also affects the Eclipse own compiler. – Luiggi Mendoza Oct 25 '13 at 15:09

1 Answers1

0

Am I missing anything?

You're trying to do a microbenchmark, but the code you've posted so far does not resemble a well working sample. To do so, please follow the rules stated here: How do I write a correct micro-benchmark in Java?.

The explanation about your code being faster is because after some iterations of your method, the JIT will trigger and the performance of your code will be optimized, thus your code getting faster, even when processing larger data.

Some recommendations:

  • Use several array/list inputs with different size. Good values to do this kind of analysis are 100, 1000 (1k), 10000 (10k), 100000 (100k), 1000000 (1m) and random size values between these. You will get more accurate results when performing evaluations that take longer time.
  • Use arrays/lists of different objects. Create a POJO and make it implement the Comparable interface, then execute your sort method. As explained above, use different arrays values.

Not directly related to your question, but the execution results are based on the JDK used. Eclipse is just an IDE and can work with different JDK versions, e.g. at my workplace I use JDK 6 u30 to work on projects on the company, but for personal projects (like proof of concepts) I use JDK 7 u40.

Community
  • 1
  • 1
Luiggi Mendoza
  • 85,076
  • 16
  • 154
  • 332
  • There are also efforts to add automatic concurrent sorting into the JRE (I believe the feature is not yet released, but it might sneak in some time in the future). I wouldn't be surprised if that screwed over a simple benchmarking effort, too (See here http://docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html) – Durandal Oct 25 '13 at 16:01