23

It is well known that String.format() performance is terrible. I see big possible improvements in my (and probably very common) typical case. I print same structure of data many times. Let imagine the structure like "x:%d y:%d z:%d". I expect that the main trouble with String.format() is that it has to always parse formatting string. My question is: Is there some ready made class which would allow to read formatting string only once and then allow to quickly give string when variable parameters filled?? Usage shall look like this:

PreString ps = new PreString("x:%d y:%d z:%d");
String s;
for(int i=0;i<1000;i++){
    s = ps.format(i,i,i); 
}

I know it is possible - following is my quick & dirty example which do what I'm talking about and is about ~10 times faster at my machine:

public interface myPrintable{
    boolean isConst();
    String prn(Object o);
    String prn();
}

public class MyPrnStr implements myPrintable{
    String s;
    public MyPrnStr(String s){this.s =s;}
    @Override public boolean isConst() { return true; }
    @Override public String prn(Object o) { return s; }
    @Override public String prn() { return s; }
}

public class MyPrnInt implements myPrintable{
    public MyPrnInt(){}
    @Override  public boolean isConst() { return false; }
    @Override  public String prn(Object o) { return String.valueOf((Integer)o);  }
    @Override  public String prn() { return "NumMissing";   }
}

public class FastFormat{
    myPrintable[]      obj    = new myPrintable[100];
    int                objIdx = 0;
    StringBuilder      sb     = new StringBuilder();

    public FastFormat() {}

    public void addObject(myPrintable o) {  obj[objIdx++] = o;   }

    public String format(Object... par) {
        sb.setLength(0);
        int parIdx = 0;
        for (int i = 0; i < objIdx; i++) {
            if(obj[i].isConst()) sb.append(obj[i].prn());
            else                 sb.append(obj[i].prn(par[parIdx++]));
        }
        return sb.toString();
    }
}

It is used like this:

FastFormat ff = new FastFormat();
ff.addObject(new MyPrnStr("x:"));
ff.addObject(new MyPrnInt());
ff.addObject(new MyPrnStr(" y:"));
ff.addObject(new MyPrnInt());
ff.addObject(new MyPrnStr(" z:"));
ff.addObject(new MyPrnInt());
for (int i = 0; i < rpt; i++) {
    s = ff.format(i,i,i);
}

when I compare with

long beg = System.nanoTime();
for (int i = 0; i < rpt; i++) {
    s = String.format("x:%d y:%d z:%d", i, i, i);
}
long diff = System.nanoTime() - beg;

For 1e6 iteration pre-formatting improves result by factor of ~10:

time [ns]: String.format()     (+90,73%)  3 458 270 585 
time [ns]: FastFormat.format() (+09,27%)    353 431 686 

[EDIT]

As Steve Chaloner replied there is a MessageFormat which is quite doing what I want. So I tried the code:

MessageFormat mf = new MessageFormat("x:{0,number,integer} y:{0,number,integer} z:{0,number,integer}");
Object[] uo = new Object[3];
for (int i = 0; i < rpt; i++) {
    uo[0]=uo[1]=uo[2] = i;
    s = mf.format(uo);
}

And it is faster only by factor of 2. Not the factor of 10 which I hoped. Again see measurement for 1M iteration (JRE 1.8.0_25-b18 32bit):

time [s]: String.format()     (+63,18%)  3.359 146 913 
time [s]: FastFormat.format() (+05,99%)  0.318 569 218 
time [s]: MessageFormat       (+30,83%)  1.639 255 061 

[EDIT2]

As Slanec replied there is org.slf4j.helpers.MessageFormatter. (I tried library version slf4j-1.7.12)

I did tried to compare code:

Object[] uo2 = new Object[3];
beg = System.nanoTime();
for(long i=rpt;i>0;i--){
    uo2[0]=uo2[1]=uo2[2] = i;
    s = MessageFormatter.arrayFormat("x: {} y: {} z: {}",uo2).getMessage();
}

with code for MessageFormat given above in section [EDIT]. I did get following results for looping it 1M times:

Time MessageFormatter [s]: 1.099 880 912
Time MessageFormat    [s]: 2.631 521 135
speed up : 2.393 times

So MessageFormatter is best answer so far yet my simple example is still little bit faster... So any ready made faster library proposal?

Community
  • 1
  • 1
Vit Bernatik
  • 3,566
  • 2
  • 34
  • 40
  • 3
    How many loop iterations? – GhostCat Apr 20 '15 at 12:33
  • Pattern (regexes) let you do this, I don't see anything for String.format off-hand. – djechlin Apr 20 '15 at 12:33
  • 5
    `String#format()` is mostly for debug and ad-hoc outputting, generally it doesn't need to be fast. – Alex Salauyou Apr 20 '15 at 12:35
  • 1
    You should use a `StringBuilder` if you need performance. – Bubletan Apr 20 '15 at 12:35
  • 4
    @SashaSalauyou Well, I would say it is the other way around: String#format() shows bad performance; therefore it should never be used extensively in production code. But that is something one has to **know**; as for example, the Javadoc for that method for sure does not mention it. I see it completely different: "base" methods, that **could** be called billions and billions of times should be as fast as possible. Meaning: you are mixing up cause and effect in my eyes. – GhostCat Apr 20 '15 at 12:40
  • @Jägermeister agree. But `String#format()` doesn't need preparation thus making its usage very simple. It is obvious that it would be slower than another one that needs preparation stage. – Alex Salauyou Apr 20 '15 at 12:50
  • @SashaSalauyou This is interpretation. Turn to a good C++ programmer and ask him about the performance of utility code provided via the STL library. I am pretty sure that he will emphasize that any such utility function has been written with greatest precautions to make it "as efficient" as possible. In their world, "badly performing" is a no-go for base stuff. – GhostCat Apr 20 '15 at 14:17
  • 1
    @Jägermeister well... after looking at the `Formatter` source code, I agree. I am disappointed of using regex instead of stream parsing there. – Alex Salauyou Apr 20 '15 at 14:29
  • @Jägermeister: I edited question - I tried for 1M times – Vit Bernatik Apr 20 '15 at 16:03
  • @djechlin: I'm suspecting regexp being slower than String.format. Am I not correct? Not mentioning that for 3 number I would need to run it 3 times. Or if I got it wrong pls show me code snippet. – Vit Bernatik Apr 20 '15 at 16:09
  • @Bubletan: Thx it is the obvious answer. But my question was to get something in the middle. The easiness of String.formatter but with some method to pre-format() for better speed. As you could see in my code example I was using StringBuilder in my example showing how I expect pre-formatting class to behave. Also I did not tell one detail - I have also separate thread logger. I would be happy to send there (Formatter f,Obejct... data) and only if some filter function later decides that this messages gets printed it will actually do create the message. – Vit Bernatik Apr 20 '15 at 16:17
  • @VitBernatik I don't think there is a good system for that, but it's pretty easy to make one. – Bubletan Apr 20 '15 at 16:27
  • Yes I agree regex is the wrong way to go. I'm just pointing out you're right that there's a common design pattern for your problem, and that you have the right design. – djechlin Apr 20 '15 at 16:33
  • 1
    The key point here is the flexibility. If course it's trivial to create something that quickly appends 3 default-formatted `int` values to a string. But if you wanted to extend it to support scientifically notated, right-aligned `BigDecimal` values with a chinese `Locale`, you'd end up implementing all the details that are now covered by the `Formatter` class. If you more clearly described the intended application case, one could probably think of an optimized solution. – Marco13 Apr 21 '15 at 06:35
  • It's a shame there really is nothing better than MessageFormat. I hope somebody writes a library for this. – Petr Janeček Apr 22 '15 at 00:33
  • Have you tried Java's [String Templates (JEP 430) feature](https://openjdk.org/jeps/430), which appears in JDK 21 as a preview feature? A string template is partially pre-processed. – mernst Aug 04 '23 at 22:00

3 Answers3

9

It sounds like you want MessageFormat

From the documentation:

The following example creates a MessageFormat instance that can be used repeatedly:

 int fileCount = 1273;
 String diskName = "MyDisk";
 Object[] testArgs = {new Long(fileCount), diskName};

 MessageFormat form = new MessageFormat(
     "The disk \"{1}\" contains {0} file(s).");
 System.out.println(form.format(testArgs));
Steve Chaloner
  • 8,162
  • 1
  • 22
  • 38
  • I have no idea - there are probably benchmarks out there. But, it satisfies the requirement of pre-compiling the message. Personally, I would go with a StringBuilder. – Steve Chaloner Apr 20 '15 at 12:42
  • 2
    There is no "requirement" to pre-compile. It is only an attempt to improve performance. – Thilo Apr 20 '15 at 12:43
  • The requirement comes from the question. "Can I precompile the format string" -> yes. – Steve Chaloner Apr 20 '15 at 12:44
  • 2
    FWIW, MessageFormat *does* some compilation of the pattern, so there is at least a chance that this could be reasonably fast. – Thilo Apr 20 '15 at 12:48
  • 1
    There's an old (2009 old!) comparison [here](http://jevopisdeveloperblog.blogspot.be/2009/09/stringformat-vs-messageformatformat-vs.html) which indicates terrible performance. That's no indicator of what it's like now, but at some point in the past it totally sucked. – Steve Chaloner Apr 20 '15 at 12:50
  • 1
    Also here: http://stackoverflow.com/questions/15358090/performance-issue-java-text-messageformat-format-vs-stringbuilder But I would discount both of these comparisons, as they only use the pattern once. No benefit to having the pre-compilation. – Thilo Apr 20 '15 at 12:54
  • Thx for response! I tried it and it is what I want and it is also faster! Unfortunately not by an expected factor of 10 but only by factor of 2. See my edited question. So I will yet keep this question open - to see if someone show me how to make it yet little bit faster. – Vit Bernatik Apr 20 '15 at 16:22
2

If you're looking for a fast implementation, you need to look outside the JDK. You probably use slf4j for logging anyway, so let's look at its MessageFormatter:

MessageFormatter.arrayFormat("x:{} y:{} z:{}", new Object[] {i, i, i}).getMessage();

On my machine (and a crude and flawed microbenchmark), it's around 1/6 slower than your FastFormat class, and around 5-10 times faster than either String::format or MessageFormat.

Petr Janeček
  • 37,768
  • 12
  • 121
  • 145
  • btw. not only was I not able to reproduce any significant performance improvement with `MessageFormat` on Java 8, it also gives a slightly different result based on current locale (which might be both an advantage and a disadvantage). – Petr Janeček May 04 '15 at 22:59
  • Thx for answer. So far the fastest solution. But my measurement (see my edited question) shows only 2.4 times faster than MessageFormat. Not 5 times like you or SLF4J announce. I tried multiple times for loop of 1M execution. – Vit Bernatik May 06 '15 at 21:27
  • 1
    @VitBernatik Maybe because I could reproduce your MessageFormat improvement. That's the problem with microbenchmarks - they're all flawed, but all of them are flawed differently for everyone, for each machine. Either way, I raised a bug for slf4j for a nicer API and for Guava to release their internal implementation. I hope this will improve, the task to way too common to write manually all the time. – Petr Janeček May 06 '15 at 23:12
  • Yup there is probably still space for improvement in slf4j. They can allow to have it pre-parsed as MessageFormat does. Then we can have more speed for common logs. Also I read that log4j v2 is fastest one. Do you think it can be used only for message formatting? As I do have my own logging mechanism (supporting for example tags, and filtering by tags...) – Vit Bernatik May 07 '15 at 12:22
  • 1
    @VitBernatik (In the last message, I meant to write "could not reproduce") ... Either way, you can try log4j2's implementation yourself: [`ParameterizedMessage`](http://logging.apache.org/log4j/2.x/log4j-api/apidocs/org/apache/logging/log4j/message/ParameterizedMessage.html). Again, it does unfortunately not cache/compile the message formats, I'd like to see some implementation do that. If you're not faster than me, I'll add measurements of `ParameterizedMessage` on Monday. – Petr Janeček May 07 '15 at 12:33
  • Hey I did not found ParametrizedMessage in compiled log4j 2.2. But I found [FormattedMessage](https://logging.apache.org/log4j/2.0/log4j-api/apidocs/org/apache/logging/log4j/message/FormattedMessage.html#FormattedMessage%28java.lang.String,%20java.lang.Object[]%29) I used it as `str = new FormattedMessage("x: {} y: {} z: {}",uo).getFormattedMessage();` but it is 48 times slower. Probably `new` is not a good thing but it does not have static method... Any hint? – Vit Bernatik May 12 '15 at 20:37
  • Hm It is strange - my intelliJ Auto-completion does not see the ParameterizedMessage. But when I use it - it compiles and works. Although in my measurement it is faster than MessageFormat, but still like 20% slower than slf4j MessageFormatter. – Vit Bernatik May 12 '15 at 20:45
  • On my PC (and my JDK 8u5 -server) it's slightly faster than slf4j (240 ms vs. 220 ms) both of which is slightly slower than your FastFormat (200 ms). Overall, yes, it seems to be in the same ballpark. We'll need to write it properly ourselves :) – Petr Janeček May 12 '15 at 20:49
  • @VitBernatik I'm deeply playing with my own implementation and found out that the `ParameterizedMessage` is slightly faster than `MessageFormatter` mostly because it doesn't handle array arguments at all and outputs garbage instead. I'll update with a compiling and caching formatter soon. – Petr Janeček May 13 '15 at 01:08
1

I said I would deliver, and here it is. My pre-compilation-capable string formatting (working proof-of-concept) library: https://gitlab.com/janecekpetr/string-format

Using

StringFormat.format("x:{} y:{} z:{}", i, i, i)

I get very similar numbers to slf4j and log4j2.

However, when using

CompiledStringFormat format = StringFormat.compile("x:{} y:{} z:{}");

// and then, in the loop
format.format(i, i, i)

I get roughly 1/3 better numbers than your FastFormat. Note that at this point, you must be formatting A LOT of strings to get significant differences.

MartyIX
  • 27,828
  • 29
  • 136
  • 207
Petr Janeček
  • 37,768
  • 12
  • 121
  • 145
  • 1
    When optimizing that hard, you may want to provide `appendTo` instead of (or in addition to) `format`. In the loop, you may want to concatenate all the strings without producing them (i.e., `format.appendTo(sb, i, i, i)` instead of `sb.append(format.format(i, i, i))`. The intermediate strings are probably more costly than boxing and varargs. – maaartinus May 25 '15 at 14:26
  • @maaartinus Thanks for the idea! On relevant note, one thing I was thinking about today was a special treatment of `Collection` classes - adding a custom `appendTo()` method for them internally as well. Your idea slipped my mind, though, I'll gladly implement it. _(Honestly, though, there's probably close to zero real-world usage for such a library. It's an intersting problem nonetheless and I couldn't believe I didn't find any existing implementations.)_ – Petr Janeček May 25 '15 at 14:48
  • 1
    The problem is the existence of `String.format`, which is damn dumb and damn slow remake from C. No matter how bad it is, it covers many use cases, and that's why nobody cared to do something better (i18n, logging and Guava's Preconditions all handle special cases only). Once I had the idea to write a customizable formatter, in which you could plug in handlers for own classes and format strings, and then call it like `format("time=[t%HH:mm:dd] wtf=[t]", System,currentTimeMillis(), someException)` and get it including the stack trace. It's way faster than `String.format`. – maaartinus May 25 '15 at 15:07
  • Other problems with `String.format` is that it can throw, which is too bad if used for logging only. It can't handle `byte[]`, it can't use `_` as thousand separator (java source compatible output is nice, sometimes). This all is doable (and partly done), but the real-world usage is demotivating. Care to [chat](http://chat.stackoverflow.com/rooms/139/java)? I'm just looking at your code. – maaartinus May 25 '15 at 15:23
  • @maaartinus At work, on mobile. I'll gladly join later on, cca in 4 hours, at 19.30 UTC. – Petr Janeček May 25 '15 at 15:38
  • Could you send me an email? my_name_here at gmail. – maaartinus May 26 '15 at 09:05
  • @Slanec Is it possible somehow to format all {} values with one single value? For example `StringFormat.compile("x:{} y:{} z:{}").format(1);` – tower120 Mar 26 '16 at 16:48
  • @tower120 It's definitely possible to change the library to do that, but I'd advise against it for several reasons: **1.** I don't see a real purpose for that. Are you sure you need to format the same argument into a string repeatedly? What for? Definitely not logging... **2.** It masks potential errors. If you forgot to add an arguments, it would not be easy to see the mistake. **3.** It's an unintuitive magic against conventions. What should `StringFormat.format("x:{} y:{} z:{}", 1, 2)` print? There are two equivalently good answers with your approach, and therefore it's error-prone. – Petr Janeček Apr 04 '16 at 17:24
  • @Slanec Well, for example I have some kinda MVVM framework. And I have in it bind Property function which binds Property value to some control in the View (in my case thats Android's TextView) and I want not just bind that value, but have it somehow decorated. The value always one, so you can't miss. It looks like `property.bind(textView, "The result is {}. {} is too high.")` – tower120 Apr 05 '16 at 17:52
  • @tower120 Aha, I see. So it looks like the best solution for you are the backreferences that [`String.format`](http://docs.oracle.com/javase/8/docs/api/java/util/Formatter.html#syntax) and/or [`MessageFormat`](http://docs.oracle.com/javase/7/docs/api/java/text/MessageFormat.html#patterns) offer. As far as I know, no current logging library offers that out of the box (log4j2 could be extended via a plugin to do that), my little POC library doesn't either. Yet. – Petr Janeček Apr 06 '16 at 11:33
  • @Slanec Well, actually I just modified your library by adding 4 new methods :) Can send you if you need them. – tower120 Apr 06 '16 at 11:58