25

My general experience with Java 7 tells me that it is faster than Java 6. However, I've run into enough information that makes me believe that this is not always the case.

The first bit of information comes from Minecraft Snooper data found here. My intention was to look at that data to determine the effects of the different switches used to launch Minecraft. For example I wanted to know if using -Xmx4096m had a negative or positive effect on performance. Before I could get there I looked at the different version of Java being used. It covers everything from 1.5 to a developer using 1.8. In general as you increase the java version you see an increase in fps performance. Throughout the different versions of 1.6 you even see this gradual trend up. I honestly wasn't expecting to see as many different versions of java still in the wild but I guess people don't run the updates like they should.

Some time around the later versions of 1.6 you get the highest peeks. 1.7 performs about 10fps on average below the later versions of 1.6 but still higher than the early versions of 1.6. On a sample from my own system it's almost impossible to see the difference but when looking at the broader sample it's clear.

To control for the possibility that someone might have found a magic switch for Java I control with by only looking at the data with No switches being passed. That way I'd have a reasonable control before I started looking at the different flags.

I dismissed most of what I was seeing as this could be some Magic Java 6 that someone's just not sharing with me.

Now I've been working on another project that requires me to pass an array in an InputStream to be processed by another API. Initially I used a ByteArrayInputStream because it would work out of the box. When I looked at the code for it I noticed that every function was synchronized. Since this was unnecessary for this project I rewrote one with the synchronization stripped out. I then decided that I wanted to know what the general cost of Synchronization was for me in this situation.

I mocked up a simple test just to see. I timed everything in with System.nanoTime() and used Java 1.6_20 x86 and 1.7.0-b147 AMD64, and 1.7_15 AMD64 and using the -server. I expected the AMD64 version to outperform based on architecture alone and have any java 7 advantages. I also looked at the 25th, 50th, and 75th percentile (blue,red,green). However 1.6 with no -server beat the pants off of every other configuration. graph

So my question is. What is in the 1.6 -server option that is impacting performance that is also defaulted to on in 1.7?

I know most of the speed enhancement in 1.7 came from defaulting some of the more radical performance options in 1.6 to on, but one of them is causing a performance difference. I just don't know which ones to look at.

public class ByteInputStream extends InputStream {

public static void main(String args[]) throws IOException {
    String song = "This is the song that never ends";
    byte[] data = song.getBytes();
    byte[] read = new byte[data.length];
    ByteArrayInputStream bais = new ByteArrayInputStream(data);
    ByteInputStream bis = new ByteInputStream(data);

    long startTime, endTime;

    for (int i = 0; i < 10; i++) {
        /*code for ByteInputStream*/
        /*
        startTime = System.nanoTime();
        for (int ctr = 0; ctr < 1000; ctr++) {
            bis.mark(0);
            bis.read(read);
            bis.reset();
        }
        endTime = System.nanoTime(); 

        System.out.println(endTime - startTime); 
        */

        /*code for ByteArrayInputStream*/
        startTime = System.nanoTime();
        for (int ctr = 0; ctr < 1000; ctr++) {
            bais.mark(0);
            bais.read(read);
            bais.reset();
        }
        endTime = System.nanoTime();

        System.out.println(endTime - startTime);
    }

}

private final byte[] array;
private int pos;
private int min;
private int max;
private int mark;

public ByteInputStream(byte[] array) {
    this(array, 0, array.length);
}

public ByteInputStream(byte[] array, int offset, int length) {
    min = offset;
    max = offset + length;
    this.array = array;
    pos = offset;
}

@Override
public int available() {
    return max - pos;
}

@Override
public boolean markSupported() {
    return true;
}

@Override
public void mark(int limit) {
    mark = pos;
}

@Override
public void reset() {
    pos = mark;
}

@Override
public long skip(long n) {
    pos += n;
    if (pos > max) {
        pos = max;
    }
    return pos;
}

@Override
public int read() throws IOException {
    if (pos >= max) {
        return -1;
    }
    return array[pos++] & 0xFF;
}

@Override
public int read(byte b[], int off, int len) {
    if (pos >= max) {
        return -1;
    }
    if (pos + len > max) {
        len = max - pos;
    }
    if (len <= 0) {
        return 0;
    }
    System.arraycopy(array, pos, b, off, len);
    pos += len;
    return len;
}

@Override
public void close() throws IOException {
}

}// end class
Carl Manaster
  • 39,912
  • 17
  • 102
  • 155
medv4380
  • 493
  • 1
  • 5
  • 6
  • 8
    +1 for a very nice graph and well-documented question. – Philip Tenn Feb 28 '13 at 21:23
  • 9
    I have the opposite opinion. There's a lot of verbiage here, and a pretty graph, but no real content. If you had just shown your code and described the results in a table, more people would be able to help you. As it is, all you're likely to get is guesses. -1 if I could. – parsifal Feb 28 '13 at 21:26
  • What's on Y axis? Have you measured multiple repeated tasks, not a single one? Are you sure that the GC isn't meddling in the results? – Dariusz Feb 28 '13 at 21:27
  • What are these measurements of? –  Feb 28 '13 at 21:31
  • 2
    You posted a lot of data and information, but not much of it seems relevant. The more I read your question (and I've read it several times) the less I understand your problem. Post code, post your measurement loop, post the section being measured. Other than that: PROFILE YOUR CODE and check what's taking how much time. – Dariusz Feb 28 '13 at 21:35
  • 3
    This question could have been a *lot* shorter. Show us the code in question and share more information about your benchmarking process. It's *really* easy to do that very badly. – Duncan Jones Feb 28 '13 at 22:03
  • I've added the code if you think it will help. However, it's common knowledge that 7 incorporated some of the -server options into -client. I suspect it's something like Escape Analysis that might not help code that cannot be optimized, and is clearly slowing it down in some situations. The question is, which one causes this behavior? – medv4380 Feb 28 '13 at 22:49
  • This is all very weird, because in my experience, new, performance-optimized releases of systems are always faster in every single testcase (existing and future), without any regressions whatsoever. – Kaz Mar 01 '13 at 06:10
  • "A simple test"... Are you absolutely certain that your simple test is representative of real work so you are not micro benchmarking? – Thorbjørn Ravn Andersen Mar 01 '13 at 06:20
  • * you have no idea what locale is being used. This is always bad. You don't really know what is in your byte array. Hell, It might even be the whole reason of the problem - you may be having a one-byte encoding for first case and two-byte encodings for other cases * your string should be longer - as it is right now, probably 1/4 of the time used is handling the loop itself * how can you analyze percentiles if you do only 10 tests per case? * increase your loop amounts to at least 1000 testsa nad 100000 repeats per test – Dariusz Mar 01 '13 at 06:51
  • In short: this test isn't really measuring anything relevant. – eis Mar 01 '13 at 11:31

1 Answers1

5

I think, as the others are saying, that your tests are too short to see the core issues - the graph is showing nanoTime, and that implies the core section being measured completes in 0.0001 to 0.0006s.

Discussion

The key difference in -server and -client is that -server expects the JVM to be around for a long time and therefore expends effort early on for better long-term results. -client aims for fast startup times and good-enough performance.

In particular hotspot runs with more optimizations, and these take more CPU to execute. In other words, with -server, you may be seeing the cost of the optimizer outweighing any gains from the optimization.

See Real differences between "java -server" and "java -client"?

Alternatively, you may also be seeing the effects of tiered compilation where, in Java 7, hotspot doesn't kick in so fast. With only 1000 iterations, the full optimization of your code won't be done until later, and the benefits will therefore be lesser.

You might get insight if you run java with the -Xprof option the JVM will dump some data about the time spent in various methods, both interpreted and compiled. It should give an idea about what was compiled, and the ratio of (cpu) time before hotspot kicked in.

However, to get a true picture, you really need to run this much longer - secondsminutes, not milliseconds - to allow Java and the OS to warm up. It would be even better to loop the test in main (so you have a loop containing your instrumented main test loop) so that you can ignore the warm-up.

EDIT Changed seconds to minutes to ensure that hotspot, the jvm and the OS are properly 'warmed up'

Community
  • 1
  • 1
Andrew Alcock
  • 19,401
  • 4
  • 42
  • 60
  • and if you want to measure some runtime performance of some specific app, often those run for hours... and web apps run 24/7. Sometimes stressed with load, sometimes not. That's the relevant use case, not what happens in the first milliseconds or seconds. – eis Mar 01 '13 at 11:27
  • @eis: I hear you. Here, we have a very small section of code to profile (rather than a web app) with only one code path through (unlike any real code), and it's CPU bound (rather than having any I/O effects) so hours might be overkill *in this case*. I am in complete concordance with you on millisecond scale. – Andrew Alcock Mar 01 '13 at 12:40