246

We have to build Strings all the time for log output and so on. Over the JDK versions we have learned when to use StringBuffer (many appends, thread safe) and StringBuilder (many appends, non-thread-safe).

What's the advice on using String.format()? Is it efficient, or are we forced to stick with concatenation for one-liners where performance is important?

e.g. ugly old style,

String s = "What do you get if you multiply " + varSix + " by " + varNine + "?";

vs. tidy new style (String.format, which is possibly slower),

String s = String.format("What do you get if you multiply %d by %d?", varSix, varNine);

Note: my specific use case is the hundreds of 'one-liner' log strings throughout my code. They don't involve a loop, so StringBuilder is too heavyweight. I'm interested in String.format() specifically.

pasignature
  • 577
  • 1
  • 6
  • 13
Air
  • 5,084
  • 5
  • 25
  • 19
  • 31
    Why don't you test it? – Ed S. Feb 04 '09 at 22:15
  • 1
    If you are producing this output, then I assume it has to be readable by a human as a rate a human can read it. Lets say 10 lines per second at most. I think you will find it really doesn't matter which approach you take, if it is notionally slower, the user might appreciate it. ;) So no, StringBuilder is not heavyweight in most situations. – Peter Lawrey Aug 15 '09 at 16:37
  • 11
    @Peter, no it's absolutely not for reading in real time by humans! It's there to help analysis when things go wrong. Log output will typically be thousands of lines per second, so it needs to be efficient. – Air Sep 15 '10 at 21:03
  • 5
    if you are producing many thousands of lines per second, I would suggest 1) use shorter text, even no text such as plain CSV, or binary 2) Don't use String at all, you can write the data into a ByteBuffer without creating any objects (as text or binary) 3) background the writing of data to disk or a socket. You should be able to sustain around 1 million lines per second. (Basicly as much as your disk subsystem will allow) You can achive bursts of 10x this. – Peter Lawrey Sep 16 '10 at 07:29
  • 7
    This isn't relevant to the general case, but for logging in particular, LogBack (written by the original Log4j author) has a form of parameterized logging that addresses this exact problem - http://logback.qos.ch/manual/architecture.html#ParametrizedLogging – Matt Passell Sep 22 '10 at 15:33
  • As a side-note: At least the Harmony/Android implementation of `String.format()` uses a `StringBuilder` internally. So if `String.format()` would be ok, `StringBuilder` alone shall be ok as well. – sstn Oct 25 '13 at 07:48
  • After reading the question, ALL the answers, and ALL the comments, I'm still wondering what is best for defining an exception message. :S – White_King Dec 17 '22 at 09:12

13 Answers13

261

I took hhafez's code and added a memory test:

private static void test() {
    Runtime runtime = Runtime.getRuntime();
    long memory;
    ...
    memory = runtime.freeMemory();
    // for loop code
    memory = memory-runtime.freeMemory();

I run this separately for each approach, the '+' operator, String.format and StringBuilder (calling toString()), so the memory used will not be affected by other approaches. I added more concatenations, making the string as "Blah" + i + "Blah"+ i +"Blah" + i + "Blah".

The result are as follows (average of 5 runs each):

Approach Time(ms) Memory allocated (long)
+ operator 747 320,504
String.format 16484 373,312
StringBuilder 769 57,344

We can see that String + and StringBuilder are practically identical time-wise, but StringBuilder is much more efficient in memory use. This is very important when we have many log calls (or any other statements involving strings) in a time interval short enough so the Garbage Collector won't get to clean the many string instances resulting of the + operator.

And a note, BTW, don't forget to check the logging level before constructing the message.

Conclusions:

  1. I'll keep on using StringBuilder.
  2. I have too much time or too little life.
Edric
  • 24,639
  • 13
  • 81
  • 91
Itamar
  • 2,707
  • 2
  • 15
  • 2
  • 9
    "don't forget to check the logging level before constructing the message", is a good advice, this should be done at least for debug messages, because there could be a lot of them and they should not enabled in production. – stivlo Oct 13 '11 at 03:52
  • 49
    No, this is not right. Sorry to be blunt but the number of upvotes it has attracted is nothing short of alarming. Using the `+` operator compiles to the equivalent `StringBuilder` code. Microbenchmarks like this are not a good way of measuring performance - why not use jvisualvm, it's in the jdk for a reason. `String.format()` *will* be slower, but due to the time to parse the format string rather than any object allocations. Deferring the creation of logging artifacts until you're sure they're needed *is* good advice, but if it would be having a performance impact it's in the wrong place. – CurtainDog Apr 09 '13 at 06:22
  • 1
    @CurtainDog, your comment was made on a four-year old post, can you point to documentation or create a separate answer to address the difference? – kurtzbot Aug 27 '14 at 18:31
  • 1
    Reference in support of @CurtainDog's comment: http://stackoverflow.com/a/1532499/2872712. That is, + is preferred unless it is done in a loop. – apricot May 03 '16 at 18:29
  • `And a note, BTW, don't forget to check the logging level before constructing the message.` is not good advice. Assuming we're talking about `java.util.logging.*` specifically, checking the logging level is when you're talking about doing advanced processing that would cause adverse effects on a program that you wouldn't want when a program does not have logging turned on to the appropriate level. String formatting is not that type of processing AT ALL. Formatting is part of the `java.util.logging` framework, and the logger itself checks the logging level before the formatter is ever invoked. – searchengine27 Apr 10 '19 at 13:48
135

I wrote a small class to test which has the better performance of the two and + comes ahead of format. by a factor of 5 to 6. Try it your self

import java.io.*;
import java.util.Date;

public class StringTest{

    public static void main( String[] args ){
    int i = 0;
    long prev_time = System.currentTimeMillis();
    long time;

    for( i = 0; i< 100000; i++){
        String s = "Blah" + i + "Blah";
    }
    time = System.currentTimeMillis() - prev_time;

    System.out.println("Time after for loop " + time);

    prev_time = System.currentTimeMillis();
    for( i = 0; i<100000; i++){
        String s = String.format("Blah %d Blah", i);
    }
    time = System.currentTimeMillis() - prev_time;
    System.out.println("Time after for loop " + time);

    }
}

Running the above for different N shows that both behave linearly, but String.format is 5-30 times slower.

The reason is that in the current implementation String.format first parses the input with regular expressions and then fills in the parameters. Concatenation with plus, on the other hand, gets optimized by javac (not by the JIT) and uses StringBuilder.append directly.

Runtime comparison

kritzikratzi
  • 19,662
  • 1
  • 29
  • 40
hhafez
  • 38,949
  • 39
  • 113
  • 143
  • 14
    There's one flaw with this test in that it's not entirely a good representation of all string formatting. Often there's logic involved in what to include and logic to format specific values into strings. Any real test should look at real-world scenarios. – Orion Adrian Feb 10 '09 at 17:27
  • Wouldn't using a StringBuffer (Thread safe) or StringBuilder (faster than StringBuffer but not thread safe) better yet than using concatenation ("+")? – Tone Jun 17 '11 at 17:53
  • 9
    There was another question on SO about + verses StringBuffer, in recent versions of Java + was replaced with StringBuffer when possible so the performance wouldn't be different – hhafez Jun 19 '11 at 23:28
  • 1
    ran this test with %s instead of %d, to remove localization from the equation. didn't matter. – Barett Mar 13 '12 at 00:32
  • 26
    This looks a lot like the sort of microbenchmark that is going to be optimized away in a very unuseful manner. – David H. Clements Sep 16 '12 at 04:34
  • 22
    Another poorly implemented micro-benchmark. How do both methods scale by orders of magnitude. How about using, 100, 1000, 10000, 1000000, operations. If you only run one test, on one order of magnitude, on a application that isn't running on an isolated core; there's no way to tell how much of the difference can be written off as 'side-effects' due to context switching, background processes, etc. – Evan Plaice Feb 22 '14 at 10:58
  • 9
    Moreover as you don't ever get out of main JIT cannot kick in. – Jan Zyka Jun 20 '14 at 08:13
  • 2
    @EvanPlaice The OP isn't using 100, 1000, 10000, or 1000000 operations in a single line, so that wouldn't be directly relevant to his question. – mbomb007 Jun 16 '15 at 20:15
  • 3
    Just wanted to add that String.format also uses a StringBuilder internally. (at least in openjdk8). So it basically is locale/regex/formatting overhead + StringBuilder vs. only StringBuilder. Even if the benchmark is poorly implemented. I think it is safe to say that doing just "x" is faster than doing "y" + "x" – mfussenegger Jan 08 '16 at 19:47
  • I know that this might sound like a wired question and is probably wired, how did you create this chart again ? – aks Oct 30 '18 at 00:39
34

All the benchmarks presented here have some flaws, thus results are not reliable.

I was surprised that nobody used JMH for benchmarking, so I did.

Results:

Benchmark             Mode  Cnt     Score     Error  Units
MyBenchmark.testOld  thrpt   20  9645.834 ± 238.165  ops/s  // using +
MyBenchmark.testNew  thrpt   20   429.898 ±  10.551  ops/s  // using String.format

Units are operations per second, the more the better. Benchmark source code. OpenJDK IcedTea 2.5.4 Java Virtual Machine was used.

So, old style (using +) is much faster.

Adam Stelmaszczyk
  • 19,665
  • 4
  • 70
  • 110
21

Your old ugly style is automatically compiled by JAVAC 1.6 as :

StringBuilder sb = new StringBuilder("What do you get if you multiply ");
sb.append(varSix);
sb.append(" by ");
sb.append(varNine);
sb.append("?");
String s =  sb.toString();

So there is absolutely no difference between this and using a StringBuilder.

String.format is a lot more heavyweight since it creates a new Formatter, parses your input format string, creates a StringBuilder, append everything to it and calls toString().

Raphaël
  • 3,646
  • 27
  • 28
  • In terms of readability, the code you posted is much more...cumbersome than String.format( "What do you get if you multiply %d by %d?", varSix, varNine); – dusktreader Aug 23 '12 at 17:34
  • 17
    No difference between `+` and `StringBuilder`indeed. Unfortunately there's a lot of misinformation in other answers in this thread. I'm almost tempted to change the question to `how should I not be measuring performance`. – CurtainDog Apr 09 '13 at 06:27
12

Java's String.format works like so:

  1. it parses the format string, exploding into a list of format chunks
  2. it iterates the format chunks, rendering into a StringBuilder, which is basically an array that resizes itself as necessary, by copying into a new array. this is necessary because we don't yet know how large to allocate the final String
  3. StringBuilder.toString() copies his internal buffer into a new String

if the final destination for this data is a stream (e.g. rendering a webpage or writing to a file), you can assemble the format chunks directly into your stream:

new PrintStream(outputStream, autoFlush, encoding).format("hello {0}", "world");

I speculate that the optimizer will optimize away the format string processing. If so, you're left with equivalent amortized performance to manually unrolling your String.format into a StringBuilder.

Dustin Getz
  • 21,282
  • 15
  • 82
  • 131
  • 5
    I don't think your speculation about optimisation of the format string processing is correct. In some real-world tests using Java 7, I found that using `String.format` in inner loops (running millions of times) resulted in more than 10% of my execution time spent in `java.util.Formatter.parse(String)`. This seems to indicate that in inner loops, you should avoid calling `Formatter.format` or anything which calls it, including `PrintStream.format` (a flaw in Java's standard lib, IMO, especially since you can't cache the parsed format string). – Andy MacKinlay Dec 22 '14 at 05:41
8

To expand/correct on the first answer above, it's not translation that String.format would help with, actually.
What String.format will help with is when you're printing a date/time (or a numeric format, etc), where there are localization(l10n) differences (ie, some countries will print 04Feb2009 and others will print Feb042009).
With translation, you're just talking about moving any externalizable strings (like error messages and what-not) into a property bundle so that you can use the right bundle for the right language, using ResourceBundle and MessageFormat.

Looking at all the above, I'd say that performance-wise, String.format vs. plain concatenation comes down to what you prefer. If you prefer looking at calls to .format over concatenation, then by all means, go with that.
After all, code is read a lot more than it's written.

dw.mackie
  • 2,025
  • 1
  • 17
  • 18
  • 1
    *I'd say that performance-wise, String.format vs. plain concatenation comes down to what you prefer* I think this is incorrect. Performance-wise, concatenation is much better. For more details please take a look on my answer. – Adam Stelmaszczyk Jun 26 '15 at 19:50
7

In your example, performance probalby isn't too different but there are other issues to consider: namely memory fragmentation. Even concatenate operation is creating a new string, even if its temporary (it takes time to GC it and it's more work). String.format() is just more readable and it involves less fragmentation.

Also, if you're using a particular format a lot, don't forget you can use the Formatter() class directly (all String.format() does is instantiate a one use Formatter instance).

Also, something else you should be aware of: be careful of using substring(). For example:

String getSmallString() {
  String largeString = // load from file; say 2M in size
  return largeString.substring(100, 300);
}

That large string is still in memory because that's just how Java substrings work. A better version is:

  return new String(largeString.substring(100, 300));

or

  return String.format("%s", largeString.substring(100, 300));

The second form is probably more useful if you're doing other stuff at the same time.

ErikE
  • 48,881
  • 23
  • 151
  • 196
cletus
  • 616,129
  • 168
  • 910
  • 942
  • 8
    Worth pointing out the "related question" is actually C# and hence not applicable. – Air Feb 10 '09 at 16:37
  • which tool did you use to measure memory fragmentation and does fragmentation even make a speed difference for ram? – kritzikratzi Jun 16 '15 at 20:15
  • It is worth pointing out that the substring method was changed from Java 7 +. It now should return a new String representation containing only the substringed characters. That means that there is no need to return a call String::new – João Rebelo Oct 18 '16 at 10:29
5

Generally you should use String.Format because it's relatively fast and it supports globalization (assuming you're actually trying to write something that is read by the user). It also makes it easier to globalize if you're trying to translate one string versus 3 or more per statement (especially for languages that have drastically different grammatical structures).

Now if you never plan on translating anything, then either rely on Java's built in conversion of + operators into StringBuilder. Or use Java's StringBuilder explicitly.

Orion Adrian
  • 19,053
  • 13
  • 51
  • 67
3

Another perspective from Logging point of view Only.

I see a lot of discussion related to logging on this thread so thought of adding my experience in answer. May be someone will find it useful.

I guess the motivation of logging using formatter comes from avoiding the string concatenation. Basically, you do not want to have an overhead of string concat if you are not going to log it.

You do not really need to concat/format unless you want to log. Lets say if I define a method like this

public void logDebug(String... args, Throwable t) {
    if(debugOn) {
       // call concat methods for all args
       //log the final debug message
    }
}

In this approach the cancat/formatter is not really called at all if its a debug message and debugOn = false

Though it will still be better to use StringBuilder instead of formatter here. The main motivation is to avoid any of that.

At the same time I do not like adding "if" block for each logging statement since

  • It affects readability
  • Reduces coverage on my unit tests - thats confusing when you want to make sure every line is tested.

Therefore I prefer to create a logging utility class with methods like above and use it everywhere without worrying about performance hit and any other issues related to it.

  • Could you leverage an existing library like slf4j-api which purports to address this usecase with their parameterized logging feature? https://www.slf4j.org/faq.html#logging_performance – ammianus Mar 06 '18 at 16:59
2

I just modified hhafez's test to include StringBuilder. StringBuilder is 33 times faster than String.format using jdk 1.6.0_10 client on XP. Using the -server switch lowers the factor to 20.

public class StringTest {

   public static void main( String[] args ) {
      test();
      test();
   }

   private static void test() {
      int i = 0;
      long prev_time = System.currentTimeMillis();
      long time;

      for ( i = 0; i < 1000000; i++ ) {
         String s = "Blah" + i + "Blah";
      }
      time = System.currentTimeMillis() - prev_time;

      System.out.println("Time after for loop " + time);

      prev_time = System.currentTimeMillis();
      for ( i = 0; i < 1000000; i++ ) {
         String s = String.format("Blah %d Blah", i);
      }
      time = System.currentTimeMillis() - prev_time;
      System.out.println("Time after for loop " + time);

      prev_time = System.currentTimeMillis();
      for ( i = 0; i < 1000000; i++ ) {
         new StringBuilder("Blah").append(i).append("Blah");
      }
      time = System.currentTimeMillis() - prev_time;
      System.out.println("Time after for loop " + time);
   }
}

While this might sound drastic, I consider it to be relevant only in rare cases, because the absolute numbers are pretty low: 4 s for 1 million simple String.format calls is sort of ok - as long as I use them for logging or the like.

Update: As pointed out by sjbotha in the comments, the StringBuilder test is invalid, since it is missing a final .toString().

The correct speed-up factor from String.format(.) to StringBuilder is 23 on my machine (16 with the -server switch).

the.duckman
  • 6,376
  • 3
  • 23
  • 21
  • 1
    Your test is invalid because it fails to take into account the time eaten up by just having a loop. You should include that and subtract it from all the other results, at a minimum (yes it can be a significant percentage). – cletus Feb 05 '09 at 05:25
  • 1
    I did that, the for loop takes 0 ms. But even if it did take time, this would only increase the factor. – the.duckman Feb 09 '09 at 10:08
  • 4
    The StringBuilder test is invalid because it does not call toString() at the end to actually give you a String you can use. I added this and the result is that StringBuilder takes about the same amount of time as +. I'm sure as you increase the number of appends it will eventually become cheaper. – Sarel Botha Feb 10 '09 at 18:34
1

Here is modified version of hhafez entry. It includes a string builder option.

public class BLA
{
public static final String BLAH = "Blah ";
public static final String BLAH2 = " Blah";
public static final String BLAH3 = "Blah %d Blah";


public static void main(String[] args) {
    int i = 0;
    long prev_time = System.currentTimeMillis();
    long time;
    int numLoops = 1000000;

    for( i = 0; i< numLoops; i++){
        String s = BLAH + i + BLAH2;
    }
    time = System.currentTimeMillis() - prev_time;

    System.out.println("Time after for loop " + time);

    prev_time = System.currentTimeMillis();
    for( i = 0; i<numLoops; i++){
        String s = String.format(BLAH3, i);
    }
    time = System.currentTimeMillis() - prev_time;
    System.out.println("Time after for loop " + time);

    prev_time = System.currentTimeMillis();
    for( i = 0; i<numLoops; i++){
        StringBuilder sb = new StringBuilder();
        sb.append(BLAH);
        sb.append(i);
        sb.append(BLAH2);
        String s = sb.toString();
    }
    time = System.currentTimeMillis() - prev_time;
    System.out.println("Time after for loop " + time);

}

}

Time after for loop 391 Time after for loop 4163 Time after for loop 227

ANON
  • 11
  • 1
0

Consider using "hello".concat( "world!" ) for small number of strings in concatenation. It could be even better for performance than other approaches.

If you have more than 3 strings, than consider using StringBuilder, or just String, depending on compiler that you use.

Saša
  • 4,416
  • 1
  • 27
  • 41
0

The answer to this depends very much on how your specific Java compiler optimizes the bytecode it generates. Strings are immutable and, theoretically, each "+" operation can create a new one. But, your compiler almost certainly optimizes away interim steps in building long strings. It's entirely possible that both lines of code above generate the exact same bytecode.

The only real way to know is to test the code iteratively in your current environment. Write a QD app that concatenates strings both ways iteratively and see how they time out against each other.

Yes - that Jake.
  • 16,725
  • 14
  • 70
  • 96
  • 1
    The bytecode for the second example *surely* calls String.format, but I'd be horrified if a simple concatenation did. Why would the compiler use a format string which would then have to be parsed? – Jon Skeet Feb 04 '09 at 22:26
  • I used "bytecode" where I should have said "binary code." When it all comes down to jmps and movs, it may well be the exact same code. – Yes - that Jake. Feb 04 '09 at 22:37