3

I am writing a set of visual interfaces to data structures in Java. The idea is that these classes should be high performance implementations of algorithms, but with embedded hooks so that the algorithms can be displayed interactively.

There are a number of reasons to do this, but if you will accept this request at face value, I want to embed calls in the middle of algorithms to identify that a particular subsection has just been done. For example, one pass of a sorting algorithm.

I would prefer that the library be both very efficient and allow this. In C++, I would plug in two different templates or use conditional compilation, and two versions of the code could reasonably be generated. Is there any way to achieve this in Java? I'm hoping someone else can come up with one because I can't.

NEWS FLASH. I tried this actual code.

For n=100,000 an insertion sort takes approximately 9800 ms with VISUALIZE as a static variable but not final, vs. about 3100 with it commented out. So the performance loss is unacceptable.

With visualize as static final, the optimizer does indeed detect it and chop it out, but given that it's final what can I do with it? I can't dynamically turn visualization on and off!

public class TestSort {
    private static boolean VISUALIZE = false; 
    private static ArrayObserver ao;
    public static void insertionSort(int[] x) {
        for (int i = 1; i < x.length; i++) {
            int temp = x[i];
            int j = i - 1;
            if (x[j] > temp) {
                do {    
                    x[j+1] = x[j];
/*              if (VISUALIZE) {
                        ao.compare(i-1, i);
            ao.copy(i-1, i);
            }*/
                } while (--j >= 0 && x[j] > temp);
                x[j+1] = temp;
            }
        }
    }
    static Random r = new Random();
    static int[] createRandomArray(int n) {
    int[] x = new int[n];
    for (int i = 0; i < x.length; i++)
        x[i] = r.nextInt(100);
        return x;
    }
    static void display(int[] x) {
    for (int i = 0; i < x.length; i++)
        System.out.print(x[i] + " ");
        System.out.println();
    }
    public static void main(String args[]) {
    //int[] x = {9, 8, 7, 6, 5, 4, 3, 2, 1};
    int [] x = createRandomArray(100000);
    ao = new ArrayObserver(x);
    if (x.length < 20) display(x);
    long t0 = System.currentTimeMillis();
    insertionSort(x);
    long t1 = System.currentTimeMillis();
    if (x.length < 20) display(x);
    System.out.println(t1-t0);  
    }
}
Dov
  • 8,000
  • 8
  • 46
  • 75
  • Why do you care if they are *compile* time variants. With a JIT compiler the distinction between compile time and run time code generation is fuzzy. – Raedwald Dec 13 '13 at 08:25
  • See http://stackoverflow.com/questions/8293124/how-to-mark-java-code-such-that-its-not-compiled – Raedwald Dec 13 '13 at 08:25
  • 1
    I want to have no overhead for callbacks in tight algorithms. However, I wish to have the algorithm simultaneously generate a slower version which can call the viewer to display how it works, to visualize the algorithm. I want two copies of the library, the fast one and the "slow" one to be generated. – Dov Dec 13 '13 at 14:40
  • 3
    See [here for a brilliant usage of ant](https://weblogs.java.net/blog/schaefa/archive/2005/01/how_to_do_condi.html); also, you can use the C/C++ preprocessor on your Java code... – dcsohl Dec 13 '13 at 14:57
  • The JIT compiler can inline method calls, eliminating the overhead fir callbacks. – Raedwald Dec 13 '13 at 22:26

5 Answers5

5

I think you have a few options:

  • Put a test in on a static final constant (e.g. a boolean), which conditionally executes the the visual interface code. The JVM will eliminate the dead code when you set the constant to "off", and your code will be fully optimised. The downside is, of course, that you can only switch this at compile time, but this may be fine if you actually want to build two copies of the library.
  • Add an extra parameter to the function that determines whether the visual interface is called, and test this where necessary in your algorithm. This will add a small amount of runtime overhead, but may well be acceptable. I suggest you benchmark this, in my experience though such tests on a local variable are usually sufficiently cheap that you can get away with it (in CPU terms, will probably be just a register test which is likely to be even cheaper than the cost of a single memory access into an int[] array...)
  • Use a higher level / meta-language to express the algorithms, and use code-generation techniques to generate the actual code you need (with or without). I've done stuff like this in Clojure, for example. It's also an option to generate bytecode directly with tools like ASM (if all you care about is execution, and don't need the Java source code).
  • Use a textual pre-processor. It should work fine for Java code generation (as it does for C/C++), though it's not such a common approach and you may find it a bit fragile. You may need to do some clever stuff like generating different class names in the file system etc.
mikera
  • 105,238
  • 25
  • 256
  • 415
  • Take a look at the code above. Would you want an extra call in the middle of the sort? Even a test is not going to be pleasant. No, I want to completely cut it out, and generate two versions of the library, calling one or the other. I really don't think that's so unreasonable. I would love to use static, but I can't change it at run time, so what's the point? – Dov Dec 13 '13 at 15:33
  • A test on a register which is correctly branch-predicted is *exceptionally* cheap on modern CPUs. I suggest benchmarking it, but it is unlikely to be a bottleneck. – mikera Dec 13 '13 at 15:51
  • @mikera look at his code: the boolean check is only one of two things performed in the inner loop (the other is an assignment) so I think it can become relevant when executed 10^10 times. – herman Dec 16 '13 at 17:44
  • @herman - I suggest benchmarking before jumping to conclusions. I did a benchmark of an equivalent inner loop (with one `x[j+1]=x[j]`). Adding an extra boolean test as I described makes no measurable difference to runtime (in fact the code with the test sometimes runs faster: if there is any difference it is less than the statistical noise). This pretty much demonstrates my point: a correctly branch predicted register test on modern CPUs is so cheap it's effectively free. – mikera Dec 16 '13 at 19:13
2

Contrary to e.g. C++, the final compilation to native machine code is done at runtime, in this case completely eliminating the need for building two separate versions for performance reasons.

If you pass the boolean to enable/disable the extra calls as a parameter to the constructor of the class that implements your algorithm and store it in a final class variable (i.e. a constant), when the algorithm gets executed in a tight loop (= a 'hot spot') the Hotspot VM will compile the class instance and remove the dead code. This kind of runtime optimizations can't be done with C++.

But note that a boolean test is likely to cost only a very small fraction of the total algorithm.

EDIT: your tests have shown that this doesn't work, although I'm not sure they are done correctly. You are not using any benchmarking framework. The most aggressive optimizations will happen with the server VM (-server) and then code must be properly warmed up first (the first 10000 or so iterations will happen uncompiled which is of course much slower). Also, using a template pattern may have better chance of being optimized than a final boolean, as the boolean check is cheap anyway and the compiler is known to do virtual call inlining (as far as I know).

EDIT2: if you don't need to switch at runtime (after all conditional compilation and separate builds won't help you there either) just go with the static final boolean which you know gets optimized. Initialize it with the value from a command line argument or config file and you can easily switch between both versions at application start time.

herman
  • 11,740
  • 5
  • 47
  • 58
  • Could you give us some hard test data that shows that the VM is properly dealing with the dead code in such a situation? A couple of years back tests I saw told that Java was pretty bad at this sort of thing, and I'd like to know if this has changed. – SáT Dec 13 '13 at 15:51
  • Ok, this is the magic I was looking for. I am going to try to instantiate one sort object with a final VISUALIZE = true, and another with it false. For static it clearly works, but if it works for objects, this thing is done! – Dov Dec 14 '13 at 04:02
  • And... no! It's a great idea, but it doesn't work. The JIT compiler recognizes static final variables, but not final variables that are instantiated at object creation time. The same code turned into an object took 10,000 milliseconds to run. Interestingly, it dropped to 7000 when I commented out the visualization test, so something else is taking longer too. The bottom line though is that Java is definitely not doing as well as a compile-time template mechanism or preprocessor in C++. – Dov Dec 14 '13 at 04:22
  • Was the loop warmed up so that it was actually compiled? Also using the server vm could help, although the warm up time is longer, but optimization is more aggressive. – herman Dec 15 '13 at 09:01
  • @Dov Also try using a separate subclass instead of a Boolean (template pattern) as the optimization (with virtual call inlining) can be per class instead of per object instance then. – herman Dec 15 '13 at 09:22
  • @herman Using a separate subclass seems reasonable, and I will try that. But if you don't accept running a 100,000 element insertion sort, I don't know what kind of "framework" you need. This thing is iterating on the order of 10^10 times, and if the JIT compiler doesn't get this it's not going to get anything. If you really think you can do better, I can post the code and you can try – Dov Dec 16 '13 at 02:04
  • @Dov I'm not saying your test result is definitely off, but I know frameworks like JMH and Caliper (Linux-only) exist for a reason. The number of iterations is definitely enough, but you are for example not discarding the early slow (non-compiled) iterations or the iterations interrupted by a compilation event. Also you may be using the client VM which is known to do less optimization than the server VM (but does it more eagerly) – herman Dec 16 '13 at 17:48
1

If you do:

private static final ENABLED = false;

// Then later...
if(ENABLED){
    call();
}

The whole if-block will not be even included in the generated bytecode (at least on newer JVMs). Would that be an option?

Enno Shioji
  • 26,542
  • 13
  • 70
  • 109
  • How do I turn ENABLED on and off without changing the source code since it's final??? – Dov Dec 13 '13 at 15:30
0

The reason why Java doesn't have a preprocessor is to prevent programmers from doing exactly what you are trying to do. Instead, you should write Java code to implement the functionality you would otherwise implement in the preprocessor directives and let the compiler optimize the generated code. This way, you will end up with code written in a single language (instead of two languages, Java and preprocessor DSL), making things a lot easier for the toolchain you use when analyzing your code.

One way to solve your problem in pure Java code is to use the Template method pattern.

npclaudiu
  • 2,401
  • 1
  • 18
  • 19
  • The point is to be able to call the efficient version, or the non-efficient version. Having a dead call in the center of an algorithm is a terrible idea. – Dov Dec 13 '13 at 15:31
  • 1
    @Dov No. See my answer: just like with a boolean check, the dead call can be removed at runtime when the class instance is compiled and the compiler knows that the virtual call is a dead one. In C++ you are right because there at compile time this isn't known. – herman Dec 13 '13 at 15:32
  • @herman. See my test. The JIT cannot do this. Nice in theory, but it doesn't work. – Dov Dec 16 '13 at 02:02
0

As this guy claims to prove, it is possible to modify the bytecode to set a final variable on runtime. I guess you could use the same aproach to set static final VISUALIZE on and off.

Community
  • 1
  • 1
Elist
  • 5,313
  • 3
  • 35
  • 73