9

I was reading about how when possible the java compiler will compile strings concatenated with the "+" operator into instances of StringBuilder, and how this makes it better to use the simple "+" operator since they compile to the same code. (Except when you are building the string in a while loop, in which case it is clearly best to use a StringBuilder.)

I've also read that the .concat method on strings is the worst choice all the time (so much so that it was made into a bug by Findbugs!).

So I decided to test it myself writing a little java class in eclipse. My results surprised me a bit. What I found was that different methods were relatively faster or slower if I complied and ran them in eclipse vs. on the command line.

First my eclipse results were:

the total millis to concatenate with + was: 12154
the total millis to concatenate with .concat was: 8840
the total millis to concatenate with StringBuilder was: 11350
the total millis to concatenate with StringBuilder with a specified size was: 5611

So in eclipse StringBuilder with the size specified was fastest, followed by .concat (weird), then StringBuilder and "+" concatenation were pretty much the same.

My results on the command line, however, were:

the total millis to concatenate with + was: 4139
the total millis to concatenate with .concat was: 8590
the total millis to concatenate with StringBuilder was: 10888
the total millis to concatenate with StringBuilder with a specified size was: 6033

So when I compiled and ran from the commnad line the "+" operator was clearly the fastest, followed by String builder with size, then concat, and last was normal StringBuilder!

This doesn't make sense to me. Obviously all the stackoverflow answers I read saying that + operators compile into normal old StringBuilder instances must be outdated.

Does anyone know what's really going on here?

I'm using jdk1.7.0_07, and so far as I can tell both eclipse and my command line are referencing the exact same one. The only difference I know of is eclipse is using "javaw", but from what I've read, that shouldn't make a difference.

Here's my test class if you want to verify I'm not doing anything wrong, but I'm pretty sure it's solid.

public class Test {

    static final int LOOPS = 100000000;
    static final String FIRST_STRING = "This is such";
    static final String SECOND_STRING = " an awesomely cool ";
    static final String THIRD_STRING = "to write string.";

    /**
     * @param args
     */
    public static void main(String[] args) {

        Test.plusOperator();
        Test.dotConcat();
        Test.stringBuilder();
        Test.stringBuilderSizeSpecified();

    }

    public static void plusOperator() {
        String localOne = FIRST_STRING;
        String localTwo = SECOND_STRING;
        String localThree = THIRD_STRING;

        Calendar startTime = Calendar.getInstance();
        for (int x = 0; x < LOOPS; x++) {
            String toPrint = localOne + localTwo + localThree;
        }
        Calendar endTime = Calendar.getInstance();
        System.out.println("the total millis to concatenate with + was: " + 
                (endTime.getTimeInMillis() - startTime.getTimeInMillis()));
    }

    public static void stringBuilder() {
        String localOne = FIRST_STRING;
        String localTwo = SECOND_STRING;
        String localThree = THIRD_STRING;

        Calendar startTime = Calendar.getInstance();
        for (int x = 0; x < LOOPS; x++) {
            StringBuilder toBuild = new StringBuilder()
                .append(localOne)
                .append(localTwo)
                .append(localThree);
        }
        Calendar endTime = Calendar.getInstance();
        System.out.println("the total millis to concatenate with StringBuilder was: " + 
                (endTime.getTimeInMillis() - startTime.getTimeInMillis()));
    }

    public static void stringBuilderSizeSpecified() {
        String localOne = FIRST_STRING;
        String localTwo = SECOND_STRING;
        String localThree = THIRD_STRING;

        Calendar startTime = Calendar.getInstance();
        for (int x = 0; x < LOOPS; x++) {
            StringBuilder toBuild = new StringBuilder(50)
                .append(localOne)
                .append(localTwo)
                .append(localThree);
        }
        Calendar endTime = Calendar.getInstance();
        System.out.println("the total millis to concatenate with StringBuilder with a specified size was: " + 
                (endTime.getTimeInMillis() - startTime.getTimeInMillis()));
    }

    public static void dotConcat() {
        String localOne = FIRST_STRING;
        String localTwo = SECOND_STRING;
        String localThree = THIRD_STRING;

        Calendar startTime = Calendar.getInstance();
        for (int x = 0; x < LOOPS; x++) {
            String toPrint = localOne.concat(localTwo).concat(localThree);
        }
        Calendar endTime = Calendar.getInstance();
        System.out.println("the total millis to concatenate with .concat was: " + 
                (endTime.getTimeInMillis() - startTime.getTimeInMillis()));
    }

}
Community
  • 1
  • 1
CorayThan
  • 17,174
  • 28
  • 113
  • 161
  • 3
    It would be useful to know the details of the JVM you are using. – Chris Knight Mar 13 '13 at 22:17
  • 9
    Your benchmark methodology is extremely suspect. It doesn't allow time for the JIT to warm up; it uses `Calendar` instead of the designated absolute-time-difference `System.nanoTime()`... – Louis Wasserman Mar 13 '13 at 22:20
  • I've run it many times with the different methods called in different orders. It can make a 100 or so milisecond difference, but we're talking thousands of miliseconds difference here, so I don't see how System.nanoTime() or the JIT warming up could be making this difference. – CorayThan Mar 13 '13 at 22:22
  • @Chris Knight I added information about the jdk I am using. It is jdk1.7.0_07. As far as I can tell I'm just using the default VM. – CorayThan Mar 13 '13 at 22:23
  • 1
    *Obviously all the stackoverflow answers I read saying that + operators compile into normal old StringBuilder instances must be outdated*: run `javap -c` on your class, and look at the generated bytecode. – JB Nizet Mar 13 '13 at 22:30
  • Try printing the value of System.getProperty("java.vm.version") and System.getProperty("java.vm.vendor") in your main method just to make sure that the command line and eclipse is using the same JVM. – uldall Mar 13 '13 at 22:31
  • 1
    I reduced the `LOOPS`to `10000000` and run the tests in a loop after a while the timing stabilized to: `+ was: 3663`, `.concat was: 2093`, `StringBuilder was: 2952`. `StringBuilder with a specified size was: 1327`. Quite as I expected `+` is nearly as fast as `StringBuilder` and the fastest is `StringBuilder with predefined size` – MrSmith42 Mar 13 '13 at 22:39
  • @uldall They both say "23.3-b01" so that should be the same. – CorayThan Mar 13 '13 at 22:39
  • @MrSmith42 I tried what you did but I got completely different results. Mine stablized to: + was: 294, .concat was: 564, StringBuilder was: 775, StringBuilder with specified size was: 375. So I'm still getting the + being faster, strangely. – CorayThan Mar 13 '13 at 23:11

2 Answers2

12

On Oracle JDK 1.7 (javac 1.7.0_17), the "+" operator is still implemented using StringBuilder, as shown by running javap -c on the class to get the bytecode (only showing the loops here):

public static void plusOperator();
Code:

  16: iload         4
  18: ldc           #10                 // int 100000000
  20: if_icmpge     53
  23: new           #11                 // class java/lang/StringBuilder
  26: dup           
  27: invokespecial #12                 // Method java/lang/StringBuilder."<init>":()V
  30: aload_0       
  31: invokevirtual #13                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  34: aload_1       
  35: invokevirtual #13                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  38: aload_2       
  39: invokevirtual #13                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  42: invokevirtual #14                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
  45: astore        5
  47: iinc          4, 1
  50: goto          16


public static void stringBuilder();
Code:

  16: iload         4
  18: ldc           #10                 // int 100000000
  20: if_icmpge     50
  23: new           #11                 // class java/lang/StringBuilder
  26: dup           
  27: invokespecial #12                 // Method java/lang/StringBuilder."<init>":()V
  30: aload_0       
  31: invokevirtual #13                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  34: aload_1       
  35: invokevirtual #13                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  38: aload_2       
  39: invokevirtual #13                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
  42: astore        5
  44: iinc          4, 1
  47: goto          16

The only difference between these two is that the version with "+" converts the StringBuilder to a String within the loop.

So the question becomes: why does your test show such different results for the same code. Or more completely, why is this not a valid micro-benchmark. Here are some possible reasons:

  • You're counting wall-clock time. This means that you're actually measuring everything that the JVM is doing while running your test. Which includes garbage collection (which is important because you're creating a lot of garbage). You can mitigate this by getting the thread CPU time.
  • You don't verify when or if HotSpot is compiling the methods. This is why you should do a warmup phase before any micro-benchmarks: basically, run your main() multiple times, before you run your actual test.
parsifal
  • 1,246
  • 6
  • 8
  • And, of course, the toString() method makes a difference: it must create a new char array, copy chars from the char array of the builder to the char array of the string, construct a new String object, and garbage collect it. – JB Nizet Mar 13 '13 at 22:40
  • Given your post, and MrSmith's comment, I agree that my micro-benchmark was invalid. I still think it's strange that it takes nested loops ran thousands of times to "stabilize" the results though. If I do a warmup, does that make it a more realistic benchmark? Would string concatenations in, say, a real web application be optimized that way or would they maybe be more similar to how it works pre-optimization a lot of the time. In the end it doesn't matter--this is all more theoretical than practical anyway, but they seem like interesting questions to me. – CorayThan Mar 13 '13 at 22:48
  • Brian Goetz wrote an article about micro-benchmarking several years ago. I don't have the link handy, but it should turn up quickly on Google. – parsifal Mar 13 '13 at 22:50
  • Good article on how it works in java 8 http://www.pellegrino.link/2015/08/22/string-concatenation-with-java-8.html – Viktor Mellgren May 10 '17 at 09:23
1

Try to place StringBuilder toBuild = new StringBuilder() above the loop. The same with String toPrint and do += for string and you will see the difference.
Don't create new String and StringBuilder in the loop.

Alex
  • 11,451
  • 6
  • 37
  • 52
  • 1
    I'm testing the single-line performance of the different concatenation methods. I do know that String builder would be MUCH superior if I were testing the performance of one string being created in a loop. The purpose of the loop is to average out the random differences of the single line concatenation. I just think it's weird that the + operator seems superior to StringBuilder for in-line string concatenation, when I've read that the + operator compiles to be a StringBuilder. – CorayThan Mar 13 '13 at 22:29