2

I came across the following code:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100000; i++) {
   String line = "foo" + Integer.toString(i) + "bar\n";
   sb.append(line);
}
System.out.print(sb.toString());

My question is what the performance/memory differences would be to use the following code instead:

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100000; i++) {
   sb.append("foo").append(Integer.toString(i)).append("bar\n");
}
System.out.print(sb.toString());

or

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100000; i++) {
   sb.append(String.format("foo %i bar", i\n));
}
System.out.print(sb.toString());

I saw a lot of questions/answers showing that using: String cumulativeOutput = ""; ... cumulativeOutput += "foo" + Integer.toString(i) + "bar\n"; is wasteful (essentially creating a new StringBuilder on every loop iteration), but didn't see any direct answers regarding the above situation. I have also seen that the length of the strings matters, and would be interested in seeing how the following code would compare:

StringBuilder sb = new StringBuilder();
String foo = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. ";
String bar = " Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.\n";
for (int i = 1000000000; i < 1000100000; i++) {
   String line = foo + Integer.toString(i) + bar;
   sb.append(line);
}
System.out.print(sb.toString());

I expect the second loop will be fastest (chaining .append), but am interested if there's some hidden details. Looking forward to see what others find or know!

vrysk
  • 21
  • 2
  • 3
    The code you show with `String`, `StringBuilder`, and concatenation has evolved over time with changing optimizations. So one approach might have been faster in one version of Java while another in another. I suggest you not fall victim to [premature optimization](https://stackoverflow.com/q/385506/642706) and instead write the simplest clearest code possible. Worry about optimizing only after you have a known, provable performance problem. – Basil Bourque Feb 05 '22 at 00:52
  • You might also get better performance out of `StringBuilder` if pre-prime the buffer (insert techno babble), basically instead of `StringBuilder sb = new StringBuilder();`, you used something like `StringBuilder sb = new StringBuilder(32);`, but I agree with Basil, unless you're facing a particular problem, try not to "out guess" the compiler – MadProgrammer Feb 05 '22 at 01:01
  • 1
    To that end, using a single type of concatenation (the string builder) is arguably simpler than using two types of string concatenation ("manually" concatenating into a single string that you then feed to the string builder, or formatting a string for the string builder), so I'd go with the middle example. Though I'd be tempted to set its initial capacity at around a million characters, which a quick eyeballing of the code suggests as being in the right ballpark -- 8 to 12 characters per iteration. – passer-by Feb 05 '22 at 01:02

3 Answers3

3

Your first example:

String line = "foo" + Integer.toString(i) + "bar\n";
sb.append(line);

is equivalent to this:

String line = new StringBuilder("foo").append(Integer.toString(i)).append("bar\n").toString();
sb.append(line);

Your third example:

sb.append(String.format("foo %i bar", i\n));

is the most wasteful, because it’s equivalent to this:

StringBuilder builder = new StringBuilder();
Formatter formatter = new Formatter(builder);
formatter.format("foo %i bar", i);
String value = formatter.out().toString();
sb.append(value);

Your second example has the best performance, but the use of Integer.toString is redundant and creates an unnecessary String:

sb.append("foo").append(i).append("bar\n");

In this form, no new objects are created (except whenever StringBuilder has to expand its internal buffer).

The difference will be quite small, often too small to notice. A modern system that can run Microsoft Office or Doom Eternal probably isn’t going to slow down noticeably from creating strings. And it’s entirely possible that the just-in-time compiler will turn append(Integer.valueOf(i)) into append(i) at runtime.

If you want guidance on the best practice, the answer is: don’t use + or String.format at all in a loop.

VGR
  • 40,506
  • 4
  • 48
  • 63
  • 1
    It’s often overlooked that the decimal system is not natural to the computer and converting a number into decimal form is expensive in general, regardless of which approach is used, and hence, may dwarf the minor differences. I’d go even a step further and say, don’t use the `Formatter` API (or one of its façades) at all when you’re not using one of its specific features, i.e. locale sensitive formatting, padding, or alignment. If you still need it, put `Formatter formatter = new Formatter(sb);` before the loop (using the outer builder) and only `formatter.format(…);` into the loop body. – Holger Feb 11 '22 at 12:21
1
  1. As Basil Bourque points out in a comment, your question smells of premature optimization. The "which is most efficient" question actually has a second meaning that you didn't consider: which is the most efficient use of your time.

  2. The answer to the "which is more efficient" question that you actually asked is going to depend on the JVM version that you use. In some cases there will be differences. In others, the JIT compiler (mostly) is likely to optimize away the differences.

  3. The way to find out which is actually faster, and what the actual differences are is to write a proper micro-benchmark ... and run it on various hardware and various Java versions.

    Important! Read this: How do I write a correct micro-benchmark in Java?

  4. I predict that the differences between the first two will be small to non-existent, and that the format approach will be slower.

  5. The big picture is that the differences between the 3 versions are probably too small to matter in most real world applications. You are probably wasting your time considering which of these versions is fastest. There are probably better things to spend your time on.

Point 2) means that we can't >tell< you which version is more efficient. Point 3) means it will cost you effort to get a definitive answer. Points 4) and 5) mean that it is probably not worth the effort getting a definitive answer. (Not least because the answer will only be definitive for the precise example that you test, on the specific platforms that you test on.)


The only other things to point out are that using StringBuilder is correct if it is essential to assemble the entire string in memory. But if not:

  • It may be more efficient to write directly to a BufferedWriter or similar. There are second-order effects to assembling a large character string in memory in terms of allocating large objects and its impact on GC behavior and memory caches and the like.

  • Any solution that entails assembling something large in the heap does not scale. If the thing you are assembling gets too large, you run out of heap space and ultimately physical RAM space.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
0

I think it's a bit faster to write a simple test that shows the performance than to actually write this question :)

@Test
    void testBuildingString() {
        var start1 = Instant.now();

        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < 100000; i++) {
            String line = "foo" + Integer.toString(i) + "bar\n";
            sb.append(line);
        }

        var end1 = Instant.now();
        var start2 = Instant.now();

        StringBuilder sb2 = new StringBuilder();
        for (int i = 0; i < 100000; i++) {
            sb2.append("foo").append(Integer.toString(i)).append("bar\n");
        }

        var end2 = Instant.now();
        var start3 = Instant.now();
        StringBuilder sb3 = new StringBuilder();
        for (int i = 0; i < 100000; i++) {
            sb3.append(String.format("foo %d bar", i));
        }
        var end3 = Instant.now();

        System.out.println("Time 1 = " + Duration.between(start1, end1));
        System.out.println("Time 2 = " + Duration.between(start2, end2));
        System.out.println("Time 3 = " + Duration.between(start3, end3));
    }

And the results are as follows, and as expected:

Time 1 = PT0.0299933S
Time 2 = PT0.0059999S
Time 3 = PT0.2450021S

Your 2nd option is the fastest, around 5 times faster than the first. String.format is the slowest of the bunch. Note that this is 100,000 iterations. So on 1 iteration this will be lestt that 0.0000001s difference

Piotr73
  • 84
  • 4
  • 2
    I am afraid that this benchmark is flawed. It doesn't mitigate the effects of JVM warmup, JIT compilation and garbage collection runs. Read https://stackoverflow.com/questions/504103 – Stephen C Feb 05 '22 at 03:50
  • 1
    Put the call in a loop 1..10 and you'd see quite different timings. Having said that though, overall I'd expect #2 to be quickest and lowest memory churn on application servers, and improved with @VGR tip to replace `Integer.toString(i)` by `i` – DuncG Feb 05 '22 at 11:28