19

I have a program I ported from C to Java. Both apps use quicksort to order some partitioned data (genomic coordinates).

The Java version runs fast, but I'd like to get it closer to the C version. I am using the Sun JDK v6u14.

Obviously I can't get parity with the C application, but I'd like to learn what I can do to eke out as much performance as reasonably possible (within the limits of the environment).

What sorts of things can I do to test performance of different parts of the application, memory usage, etc.? What would I do, specifically?

Also, what tricks can I implement (in general) to change the properties and organization of my classes and variables, reducing memory usage and improving speed?

EDIT : I am using Eclipse and would obviously prefer free options for any third-party tools. Thanks!

Alex Reynolds
  • 95,983
  • 54
  • 240
  • 345
  • Have you benchmarked the C version against the Java version. Using JIT techniques, the JVM can actually out perform native code in certain circumstances, depending on the machine code actually generated by the C compiler. – Martin OConnor Jun 02 '09 at 11:27
  • Depending of the nature of the data and how you choose your pivot in your quick sort algorithm, you may want to consider merge sort as its worst case is O(n log n) – svachon Jun 02 '09 at 13:30
  • Try using the built-in quicksort to see how it compares with your implementation. You might be surprised :) – Thorbjørn Ravn Andersen Jun 02 '09 at 22:25
  • Some performance improvement tips that may also be relevant for Java here: https://stackoverflow.com/questions/32581644 – tkruse Jul 03 '19 at 10:10

14 Answers14

34

do not try to outsmart the jvm.

in particular:

  • don't try to avoid object creation for the sake of performance

  • use immutable objects where applicable.

  • use the scope of your objects correctly, so that the GC can do its job.

  • use primitives where you mean primitives (e.g. non-nullable int compared to nullable Integer)

  • use the built-in algorithms and data structures

  • when handing concurrency use java.util.concurrent package.

  • correctness over performance. first get it right, then measure, then measure with a profiler then optimize.

Andreas Petersson
  • 16,248
  • 11
  • 59
  • 91
  • Can you explain what this means? I don't know if I'm doing anything to outsmart the JVM. – Alex Reynolds Jun 02 '09 at 09:41
  • 22
    While many of your advices may be true, I'd boldly disagree with the first one. Reducing object creation was the most crutial part to optimization in many of my projects. Of course, it's no use to save 10 big objects, but often you end up with millions of small objects, and not creating them is important. – Lena Schimmel Jun 02 '09 at 10:02
  • 6
    Andreas is largely correct, object allocation is extremely quick in modern VM's (a lot faster than C++ for example) and assuming they don't live long they'll be cleaned up in a minor collection. Having said that, profile, profile some more, only do things that you have evidence for, and not just because somebody said so. – Gareth Davis Jun 02 '09 at 10:18
  • 3
    Outsmarting the VM means trying not trying to improve performance, by, for example, avoiding object creation. Leave these kinds of optimisations to the compiler and VM. Afterwards, perhaps connect a profiler such as VisualVM/JProfiler to see which parts of your code are most affecting your program's performance, then concentrate on improving them. – Rich Jun 02 '09 at 10:37
  • 1
    That first sentence should read "Outsmarting the VM means trying not to improve performance, by, for example, avoiding object creation". – Rich Jun 02 '09 at 10:37
  • A profiler will tell you if object creation is a bottleneck. Integer.valueOf(int) is worth using instead of new Integer(int), for example, but this is a rare case. Inappropriate caching of objects, leading to them surviving long enough to be promoted out of Eden space, will result in a performance hit. – Bill Michell Jun 02 '09 at 11:04
  • 6
    @Brian: In Java 1.4 memory allocation for an object took less than 10 processor instructions. The magic with movable generational GC is that free memory is always contiguous, allocating 10 bytes is just returning the current free pointer and incrementing it by 10. Conversely, reusing old objects holding references to possibly newer objects will impact performance (possibly forcing memory moves). That is 'trying to outsmart the VM'. Depending on your object definitions and the code you use, you can improve or worsen performance. – David Rodríguez - dribeas Jun 02 '09 at 11:08
  • The first point should read: "don't try to avoid object creation for the sake of performance unless profiling tells you that you need to". – quant_dev Jun 02 '09 at 11:28
  • @quant_dev: this is what the last point (last resort) points toward – Andreas Petersson Jun 02 '09 at 12:48
  • @dribes That always seems a rather dubious boast, since some processor instructions ( such as lock cmpxchg ) take tens of cycles, so 10 instructions could be 10 cycles or 400. Secondly that is memory allocation - the JVM also initialises all values in objects and arrays, and for data intensive applications initialising an array to 0 then again to useful values doubles the run time compared to C. – Pete Kirkham Jun 02 '09 at 13:16
11

Obviously, profile profile profile. For Eclipse there's TPTP. Here's an article on the TPTP plugin for Eclipse. Netbeans has its own profiler. jvisualvm is nice as a standalone tool. (The entire dev.java.net server seems to be down at the moment, but it is very much an active project.)

The first thing to do is use the library sorting routine, Collections.sort; this will require your data objects to be Comparable. This might be fast enough and will definitely provide a good baseline.

General tips:

  • Avoid locks you don't need (your JVM may have already optimized these away)
  • Use StringBuilder (not StringBuffer because of that lock thing I just mentioned) instead of concatenating String objects
  • Make anything you can final; if possible, make your classes completely immutable
  • If you aren't changing the value of a variable in a loop, try hoisting it out and see if it makes a difference (the JVM may have already done this for you)
  • Try to work on an ArrayList (or even an array) so the memory you're accessing is contiguous instead of potentially fragmented the way it might be with a LinkedList
  • Quicksort can be parallelized; consider doing that (see quicksort parallelization)
  • Reduce the visibility and live time of your data as much as possible (but don't contort your algorithm to do it unless profiling shows it is a big win)
Hank Gay
  • 70,339
  • 36
  • 160
  • 222
  • 1
    when escape analysis is in place and 100% correctly working, StringBuilder has the same performance characteristics as StringBuffer. haven't tried it under j6u14 yet. so you should not worry about this too heavily. – Andreas Petersson Jun 02 '09 at 10:35
  • 1
    Concatenation of plain Strings is implemented by means of StringBuilder and optimized in many cases. That kind of micro-optimization is what made some of use use StringBuffer-s in the past just to realize that with modern VMs plain String concatenation is faster than our hand-tailored... who will refactor those StringBuffer to StringBuilder now? That is one of the examples of trying to outsmart the compiler/VM. – David Rodríguez - dribeas Jun 02 '09 at 11:12
  • Yes, switching from String concatenation to using a StringBuilder brought a huge performance increase in a few JSPs I was rendering. Good Point. – Kieveli Jun 02 '09 at 11:39
  • @Andreas: No use in hoping the JVM can figure out what you meant (I don't need "thread-safe" appends) when there is already a class that makes that intent explicit. StringBuffer won't handle a fair amount of the locking that a truly thread-safe program is going to need anyway (ordering issues, mostly). @dribeas: I realize that is the theory, but any number of things can cause the JVM to not perform that optimization. Also, StringBuffer -> StringBuilder is a trivial refactoring in any case where it is valid. – Hank Gay Jun 02 '09 at 12:23
5

Use a profiler:

Use the latest version of JVM from your provider. Incidentally Sun's Java 6 update 14 does bring performance improvements.

Measure your GC throughput and pick the best garbage collector for your workload.

Robert Munteanu
  • 67,031
  • 36
  • 206
  • 278
4

Don't optimize prematurely.

Measure performance, then optimize.

Use final variables whenever possible. It will not only allow JVM to optimize more, but also make your code easier to read and maintain.

If you make your objects immutable, you don't have to clone them.

Optimize by changing the algorithm first, then by changing the implementation.

Sometimes you need to resort to old-style techniques, like loop unrolling or caching precalculated values. Remember about them, even if they don't look nice, they can be useful.

quant_dev
  • 6,181
  • 1
  • 34
  • 57
2

First caveat - make sure you have done appropriate profiling or benchmarking before embarking on any optimisation work. The results will often enlighten you, and nearly always save you a lot of wasted effort in optimising something that doesn't matter.

Assuming that you do need it, then you can get performance comparable to C in Java, but it takes some effort. You need to know where the JVM is doing "extra work" and avoid these.

In particular:

  • Avoid unnecessary object creation. While the JVM heap and GC is extremely fast and efficient (probably the best in the world, and almost certainly better than anything you could roll yourself in C), it is still heap allocation and that will be beaten by avoiding the heap in the first place (stack or register allocation)
  • Avoid boxed primitives. You want to be using double and not Double.
  • Use primitive arrays for any big chunks of data. Java primitive arrays are basically as fast as C/C++ arrays (they do have an additional bounds check but that is usually insignificant)
  • Avoid anything synchronized - Java threading is pretty decent but it is still overhead that you may not need. Give each thread it's own data to work on.
  • Exploit concurrency - Java's concurrency support is very good. You might as well use all your cores! This is a big topic but there are plenty of good books / tutorials available.
  • Use specialised collection classes for certain types of data if you have some very specific requirements, e.g. supporting some specialised sorting/search algorithms. You may need to roll your own, but there are also some good libraries with high performance collection classes available that may fit your needs - see e.g. Javoltion
  • Avoid big class heirarchies - this is a design smell in performance code. Every layer of abstraction is costing you overhead. Very fast Java code will often end up looking rather like C....
  • Use static methods - the JIT can optimise these extremely well. It will usually inline them.
  • Use final concrete classes - again, the JIT can optimise these very well by avoiding virtual function calls.
  • Generate your own bytecode - if all else fails, this can be a viable option if you want the absolute maximum performance out of the JVM. Particularly useful if you need to compile your own DSL. Use something like ASM.
mikera
  • 105,238
  • 25
  • 256
  • 415
2

jvisualvm ships with JDK 6 now - that's the reason the link cited above doesn't work. Just type "jvisualvm <pid>", where <pid> is the ID of the process you want to track. You'll get to see how the heap is being used, but you won't see what's filling it up.

If it's a long-running process, you can turn on the -server option when you run. There are a lot of tuning options available to you; that's just one.

duffymo
  • 305,152
  • 44
  • 369
  • 561
2

Also try tweaking the runtime arguments of the VM - the latest release of the VM for example includes the following flag which can improve performance in certain scenarios.

-XX:+DoEscapeAnalysis 
Rich
  • 15,602
  • 15
  • 79
  • 126
1

If your algorithm is CPU-heavy, you may want to consider taking advantage of parallelisation. You may be able to sort in multiple threads and merge the results back later.

This is however not a decision to be taken lightly, as writing concurrent code is hard.

Simon Nickerson
  • 42,159
  • 20
  • 102
  • 127
1

Can't you use the sort functions that are included in the Java library?

You could at least look at the speed difference between the two sorting functions.

Peter Stuifzand
  • 5,084
  • 1
  • 23
  • 28
  • The comparator I am using is customized to deal with ordering a genomic and positional data structure. – Alex Reynolds Jun 02 '09 at 10:01
  • @Alex If you make your data objects implement [Comparable](http://java.sun.com/javase/6/docs/api/java/lang/Comparable.html) you can still use the library sort. – Hank Gay Jun 02 '09 at 10:29
  • @Hank: why can't @Alex use the overloaded sort method that takes a Comparator? – Miserable Variable Jun 02 '09 at 10:40
  • @Hemal The code is cleaner when you use the natural sort: no need to create a comparator and fewer arguments to pass. Of course, if Alex's sort criteria don't make sense as the natural order, the Comparator version is the way to go. – Hank Gay Jun 02 '09 at 12:30
0

Profile and tune your java program and host machine. Most code follows 80/20 rule. That is 20% of code 80% of time, so find that 20% and make it as fast as possible. For example, the article Tuning Java Servers (http://www.infoq.com/articles/Tuning-Java-Servers) provides a description of drill down from command line and then isolate the problem using tools like Java Flight recorder, Eclipse Memory Analyser, and JProfiler.

srinath_perera
  • 271
  • 1
  • 3
  • 10
0

Methodolically, you have to profile the application and then get an idea of what components of your program are time and memory-intensive: then take a closer look to that components, in order to improve their performances (see Amdahl's law).

From a pure technological POV, you can use some java-to-nativecode compilers, like Excelsior's jet, but I've to note that recent JVM are really fast, so the VM should not impact in a significative manner.

akappa
  • 10,220
  • 3
  • 39
  • 56
0

Is your sorting code executing only once, e.g. in a commandline utility that just sorts, or multiple times, e.g. a webapp that sorts in response to some user input?

Chances are that performance would increase significantly after the code has been executed a few times because the HotSpot VM may optimize aggressively if it decides your code is a hotspot.

This is a big advantage compared to C/C++.

The VM, at runtime, optimizes code that is used often, and it does that quite well. Performance can actually rise beyond that of C/C++ because of this. Really. ;)

Your custom Comparator could be a place for optimization, though.

Try to check inexpensive stuff first (e.g. int comparison) before more expensive stuff (e.g. String comparison). I'm not sure if those tips apply because I don't know your Comparator.

Use either Collections.sort(list, comparator) or Arrays.sort(array, comparator). The array variant will be a bit faster, see the respective documentation.

As Andreas said before: don't try to outsmart the VM.

Huxi
  • 4,242
  • 3
  • 33
  • 31
0

Perhaps there are other routes to performance enhancement other than micro-optimization of code. How about a different algorithm to achieve what you wanted your program to do? May be a different data structure?

Or trade some disk/ram space for speed, or if you can give up some time upfront during the loading of your program, you can precompute lookup tables instead of doing calculations - that way, the processing is fast. I.e., make some trade-offs of other resources available.

Chii
  • 14,540
  • 3
  • 37
  • 44
0

Here's what I would do, in any language. If samples show that your sort-comparison routine is active a large percentage of the time, you might find a way to simplify it. But maybe the time is going elsewhere. Diagnose first, to see what's broken, before you fix anything. Chances are, if you fix the biggest thing, then something else will be the biggest thing, and so on, until you've really gotten a pretty good speedup.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135