10

So I've been lead to believe that using the "+" operator to append Strings on a single line was just as efficient as using a StringBuilder (and definitely much nicer on the eyes). Today though I was having some speed issues with a Logger that was appending variables and strings, it was using a "+" operator. So I made a quick test case and to my surprise found that using a StringBuilder was quicker!

The basics are I used the average of 20 runs for each number of appends, with 4 different methods(shown below).

Results, times (in milliseconds)

                                               # of Appends
                           10^1    10^2    10^3    10^4     10^5     10^6       10^7
StringBuilder(capacity)    0.65    1.25    2       11.7     117.65   1213.25    11570
StringBuilder()            0.7     1.2     2.4     12.15    122      1253.7     12274.6
"+" operator               0.75    0.95    2.35    12.35    127.2    1276.5     12483.4
String.format              4.25    13.1    13.25   71.45    730.6    7217.15    -

Graph of percentage Difference from the fastest algorithm.

% Difference in String timings

I've checked out the byte code, it's different for each string comparison method.

Here is what I'm using for the methods, and you can see the whole test class here.

public static String stringSpeed1(float a, float b, float c, float x, float y, float z){
    StringBuilder sb = new StringBuilder(72).append("[").append(a).append(",").append(b).append(",").append(c).append("][").
            append(x).append(",").append(y).append(",").append(z).append("]");
    return sb.toString();
}

public static String stringSpeed2(float a, float b, float c, float x, float y, float z){
    StringBuilder sb = new StringBuilder().append("[").append(a).append(",").append(b).append(",").append(c).append("][").
            append(x).append(",").append(y).append(",").append(z).append("]");
    return sb.toString();
}

public static String stringSpeed3(float a, float b, float c, float x, float y, float z){
    return "["+a+","+b+","+c+"]["+x+","+y+","+z+"]";
}

public static String stringSpeed4(float a, float b, float c, float x, float y, float z){
    return String.format("[%f,%f,%f][%f,%f,%f]", a,b,c,x,y,z);
}

I've now tried with floats, ints, and strings. All of which show more or less the same time difference.

Questions

  1. The "+" operator is clearly not becoming the same byte code, and the time is very different from the optimal. So what gives?
  2. The behavior of the algorithms betwen 100 and 10000 number of appends is very odd to me, so does anyone have an explanation?
Community
  • 1
  • 1
greedybuddha
  • 7,488
  • 3
  • 36
  • 50
  • The behavior of the algorithms betwen 100 and....??? – MobA11y May 23 '13 at 19:42
  • fixed, it cut it off for some reason – greedybuddha May 23 '13 at 19:43
  • 2
    Great question, with research and data to back it up. +1 – syb0rg May 23 '13 at 19:45
  • When you run these, are you optimizing? Frequently operators and function calls, when not optimized, can behave oddly in memory and cause odd performance anomalies that are not there when properly optimized. – MobA11y May 23 '13 at 19:46
  • So I'm using the standard default compiling on my computer, Mac. `java version "1.6.0_45"`. I'm not using any particular optimization flags – greedybuddha May 23 '13 at 19:47
  • Here's a good list of tuning options, it's not as important as optimizing for benchmarking in C++, but you can still tune a little bit at compile time. Here's some info. http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html#PerformanceTuning – MobA11y May 23 '13 at 19:50
  • Can you explain this a little bit more, I think I don't get it and data looks very interesting. What do you mean by "Graph of percentage Difference from the fastest algorithm."? What is `the fastest alghoritm` difference? – Grzegorz Gajos May 23 '13 at 19:53
  • So what I did was took the time of the algorithm that performed the best for that number of appends, at 100 it was the "+" and took 0.95ms. Then the rest of the algorithms are based off of how much slower they were from that. Example StringBuilder() was 1.2ms, so that is ~26% slower than 0.95 – greedybuddha May 23 '13 at 19:56
  • ChrisCM: I'll definitely check the flags out, but I think a lot of people don't specify when they compile their programs. And to quote from the answer of the question I linked `"Version 1("+") is preferable because it is shorter and the compiler will in fact turn it into version 2(StringBuilder) - no performance difference whatsoever."` – greedybuddha May 23 '13 at 20:05
  • @greedybuddha: people who care enough to benchmark, should at least know of them, but yes I agree, in general people let java optimize itself. – MobA11y May 23 '13 at 20:06

2 Answers2

4

I did not like two things about your test case. First, you ran all the tests within the same process. When dealing with "large" (ambiguous I know), but when dealing with anything where how your proces interracts with memory is your primary concern, you should always benchmark in a separate run. Just the fact that we've spun up the garbage collect can effect results from earlier runs. The way you factored your results kind of confused me. What I did was take each on individual runs and knocked a zero off of the number of times I ran it. I also let it run for a number of "reps", timing each rep. Then printed out the number of milliseconds each run took. Here is my code:

import java.util.Random;

public class blah {
  public static void main(String[] args){
    stringComp();
    }

    private static void stringComp() {
        int SIZE = 1000000;
        int NUM_REPS = 5;
        for(int j = 0; j < NUM_REPS; j++) {
            Random r = new Random();
            float f;
            long start = System.currentTimeMillis();
            for (int i=0;i<SIZE;i++){
                f = r.nextFloat();
                stringSpeed3(f,f,f,f,f,f);
            }
            System.out.print((System.currentTimeMillis() - start));
            System.out.print(", ");
        }
    }

    public static String stringSpeed1(float a, float b, float c, float x, float y, float z){
        StringBuilder sb = new StringBuilder(72).append("[").append(a).append(",").append(b).append(",").append(c).append("][").
                append(x).append(",").append(y).append(",").append(z).append("]");
        return sb.toString();
    }

    public static String stringSpeed2(float a, float b, float c, float x, float y, float z){
        StringBuilder sb = new StringBuilder().append("[").append(a).append(",").append(b).append(",").append(c).append("][").
                append(x).append(",").append(y).append(",").append(z).append("]");
        return sb.toString();
    }

    public static String stringSpeed3(float a, float b, float c, float x, float y, float z){
        return "["+a+","+b+","+c+"]["+x+","+y+","+z+"]";
    }

    public static String stringSpeed4(float a, float b, float c, float x, float y, float z){
        return String.format("[%f,%f,%f][%f,%f,%f]", a,b,c,x,y,z);
    }

}

Now my results:

stringSpeed1(SIZE = 10000000): 11548, 11305, 11362, 11275, 11279
stringSpeed2(SIZE = 10000000): 12386, 12217, 12242, 12237, 12156
stringSpeed3(SIZE = 10000000): 12313, 12016, 12073, 12127, 12038

stringSpeed1(SIZE = 1000000): 1292, 1164, 1170, 1168, 1172
stringSpeed2(SIZE = 1000000): 1364, 1228, 1230, 1224, 1223
stringSpeed3(SIZE = 1000000): 1370, 1229, 1227, 1229, 1230

stringSpeed1(SIZE = 100000): 246, 115, 115, 116, 113
stringSpeed2(SIZE = 100000): 255, 122, 123, 123, 121
stringSpeed3(SIZE = 100000): 257, 123, 129, 124, 125

stringSpeed1(SIZE = 10000): 113, 25, 14, 13, 13
stringSpeed2(SIZE = 10000): 118, 23, 24, 16, 14
stringSpeed3(SIZE = 10000): 120, 24, 16, 17, 14

//This run SIZE is very interesting.  
stringSpeed1(SIZE = 1000): 55, 22, 8, 6, 4 
stringSpeed2(SIZE = 1000): 54, 23, 7, 4, 3
stringSpeed3(SIZE = 1000): 58, 23, 7, 4, 4

stringSpeed1(SIZE = 100): 6, 6, 6, 6, 6 
stringSpeed2(SIZE = 100): 6, 6, 5, 6, 6
stirngSpeed3(SIZE = 100): 8, 6, 7, 6, 6

As you can see from my results, on values that are in the "middle ranges" each consecutive rep gets faster. This, I believe, is explained by the JVM getting running and grabbing onto the memory it needs. As the "size" increases this effect is not allowed to take over, because there is too much memory for the garbage collector to let go of, and for the process to latch back onto. Also, when you're doing a "repetitive" benchmark like this, when most of your process can exist in lower levels of cache, rather than in RAM, your process is even more senstive to branch predictors. These are very smart, and would catch on to what your process is doing, and I imagine the JVM amplifies this. This also helps explain why the values on initial loops are slower, and why the way you were approaching benchmarking this was a poor solution. This is why I think your results for values that aren't "large" are skewed and seem odd. Then as the "memory footprint" of your benchmark increased this branch prediction has less effect (percentage wise) than the large strings you were appending being shifted around in RAM.

Simplified conclusion: Your results for "large" runs are reasonably valid, and seem similar to mine(though I still don't completely understand how you got your results, but the percentages seem to line up well in comparison). However, your results for smaller runs are not valid, because of the nature of your test.

MobA11y
  • 18,425
  • 3
  • 49
  • 76
  • I think you're onto something with the branch predictors, I figured it was the gc skewing the smaller runs, but I really love your drop from run 1,2 to 3... that says volumes to me. As for the larger runs, you are showing the same effect, the methods are not equivalent – greedybuddha May 23 '13 at 21:36
  • You are postulating many assumptions here, most of which are definitively wrong. The most obvious reason for the speed-up for lower batch sizes is that the JVM starts with interpreting the byte code, after a few runs the byte code is compiled into native machine code (causing a one time penalty, but a significant speedup on subsequent runs) and after more runs, the JVM may even recompile the byte code several times using different optimization strategies based on collected run time statistics. – jarnbjo May 23 '13 at 21:37
  • Also, I'm not sure what you mean by how I got my results. I just totaled the times for each repetition, similar to what you did but with +=, then divided by the number of reps at the end. – greedybuddha May 23 '13 at 21:38
  • @jarnbjo: What I meant by the way you factored your results(poor wording) was in your percentages. I didn't understand your approach to presenting your results, I looked at it again and it makes more sense now! – MobA11y May 23 '13 at 23:14
  • @jarnbjo: I did not mean my hypothesis for my conclusions to be difinitive(note the "I believe"), I was sharing results and guessing as to the reason. I'm more of a C++ guy than a java guy. Your explanation seems reasonable, though I doubt branch prediction and memory management have nothing to do with it, and I did mention that the JVM was probably amplifying these effects, I just don't know the innards of the JVM to put that to more concise terms. It is likely a combination of the two explanations – MobA11y May 23 '13 at 23:43
3

The Java Language Specification does not specify how the string concatenation is performed, but I doubt that your compiler does anything but the equivalent of:

new StringBuilder("[").
  append(a).
  append(",").
  append(b).
  append(",").
  append(c).
  append("][").
  append(x).
  append(",").
  append(y).
  append(",").
  append(z).
  append("]").
  toString();

You can use "javap -c ..." to decompile your class file and verify this.

If you measure any significant and repetitive difference in runtime between your methods, I would much rather assume that the garbage collector is running at different times, than that there is any actual, significant performance difference. Creating the StringBuilders with different initial capacities may of course have some impact, but it should be insignificant compared to the effort required to e.g. format the floats.

jarnbjo
  • 33,923
  • 7
  • 70
  • 94
  • I have decompiled the code, and the "+" operator is different than the StringBuilder(), though I didnt try against StringBuilder("[");, ill look at that – greedybuddha May 23 '13 at 21:18
  • Can you add the output of javap to your question? – jarnbjo May 23 '13 at 21:39
  • I linked it in the question as well. https://gist.github.com/parnell/5c34f244e6f4588cc9ac. All of them are fairly similar, but the few differences seem to start causing noticeable differences at large volume runs. – greedybuddha May 23 '13 at 21:43
  • If I'm not completely wrong, based on the byte code, your stringSpeed5 method is implemented as `StringBuilder sb = new StringBuilder("[")... ; return sb.toString();` The equivalent method with string concatenation would have been `String s = "[" + ...; return s;`. If you don't use a local variable, but implement the 5th method directly as `return new StringBuilder("[")...toString();` you should get the same byte code. – jarnbjo May 23 '13 at 21:50
  • So that looks like the magic. Between the "new StringBuilder("[");" and returning directly that finally makes the byte code equivalent. This means two things to me. 1) the answer to the other question is still wrong, as the original poster assigns a new StringBuilder(capacity) and returns it, which doesn't lead to equivalent bytecode. 2) I'm very disappointed with the java compiler not being smart enough to make assign a new object and immediately returning it just returning it... I was sure that was something it automatically did. – greedybuddha May 24 '13 at 00:02
  • The Java compiler (with that I mean the tool javac, which is compiling source code to byte code) is actually not particularly clever when it comes to optimization, but it has a reason: It knows absolutely nothing about the system where the code eventually will be executed. Therefore optimizations are performed by the JVM at runtime, making it very difficult to benchmark the impact of modifications to the source code, since the actually executed machine code may even change now and then if the VM decides to reoptimize the code at any time. – jarnbjo May 24 '13 at 08:21