2

Let's say I'm constructing a set of Strings where each String is a prefix of the next one. For example, imagine I write a function:

public Set<String> example(List<String> strings) {
    Set<String> result = new HashSet<>();
    String incremental = "";
    for (String s : strings) {
        incremental = incremental + ":" + s;
        result.add(incremental);
    }
    return result;
}

Would it ever be worthwhile to rewrite it to use a StringBuilder rather than concatenation? Obviously that would avoid constructing a new StringBuilder in each iteration of the loop, but I'm not sure whether that would be a significant benefit for large lists or whether the overhead that you normally want to avoid by using StringBuilders in loops is mostly just the unnecessary String constructions.

  • The difference is negligible for small operations (few concatenations), but can greatly speed up the process for large operations. If you notice a performance impact from string concatenation, use a string builder. – Michael Bianconi Jan 17 '20 at 22:51
  • 3
    That depends on your definition of "significant". To determine an answer that isn't arbitrary, compare the runtime difference for your specific application. – LowKeyEnergy Jan 17 '20 at 22:52
  • 3
    The benefit of a StringBuilder is to **avoid** generating a lot of incrementally different strings. If you **want** to generate incrementally different strings (which you apparently do because you're putting them all in a set), then a StringBuilder probably isn't going to help you. – khelwood Jan 17 '20 at 22:54
  • That depends on the size of `List`, Use only StringBuilder when you want to concatenate (or build) a String when you have a large list or list of dynamic size. – fabfas Jan 17 '20 at 23:05
  • 2
    @Alex Fotland: Why don't you measure and compare? – mentallurg Jan 17 '20 at 23:20
  • 1
    @ArvindKumarAvinash that question is definitely not a duplicate. This one is about explicitly converting all of the intermediate results to strings, because they need to be stored in a set. – kaya3 Jan 17 '20 at 23:39

2 Answers2

0

Generally you always want to go for StringBuilder in a loop, as O(n) algorithms turn into O(n^2). However, this already O(n^2). Even required memory usage is O(n^2). It looks as if it wont hugely matter, but perhaps there could be a factor of two performance difference. Also, as you can see from comments, readers are expecting StringBuilder - don't surprise them unnecessarily.

In general, whilst some may say measure, O(n^2) can blow up in situations that don't occur in testing. In any case, who wants to microbenchmark all their code? Avoid big-O inefficiencies as a matter of course.

On some implementations, String.substring would share the backing char[] between original and substring. However, I don't believe this is currently usually done. That doesn't stop you writing your own little String class.

Tom Hawtin - tackline
  • 145,806
  • 30
  • 211
  • 305
0

This answer is only correct for Java 8; as @user85421 points out, + on strings is no longer compiled to StringBuilder operations in Java 9 and later.


Theoretically at least, there is still a reason to use a StringBuilder in your example.

Let's consider how string concatenation works: the assignment incremental = incremental + ":" + s; actually creates a new StringBuilder, appends incremental to it by copying, then appends ":" to it by copying, then appends s to it by copying, then calls toString() to build the result by copying, and assigns a reference to the new string to the variable incremental. The total number of characters copied from one place to another is (N + 1 + s.length()) * 2 where N is the original length of incremental, because of copying every character into the StringBuilder's buffer once, and then back out again once.

In contrast, if you use a StringBuilder explicitly - the same StringBuilder across all iterations - then inside the loop you would write incremental.append(":").append(s); and then explicitly call toString() to build the string to add to the set. The total number of characters copied here would be (1 + s.length()) * 2 + N, because the ":" and s have to be copied in and out of the StringBuilder, but the N characters from the previous state only have to be copied out of the StringBuilder in the toString() method; they don't also have to be copied in, because they were already there.

So, by using a StringBuilder instead of concatenation, you are copying N fewer characters into the buffer on each iteration, and the same number of characters out of the buffer. The value of N grows from initially 0, to the sum of the lengths of all of the strings (plus the number of colons), so the total saving is quadratic in the sum of the lengths of the strings. That means the saving could be quite significant; I'll leave it to someone else to do the empirical measurements to see how significant it is.

kaya3
  • 47,440
  • 4
  • 68
  • 97
  • 1
    in actual implementations (OpenJDK13) the `StringBuilder` is not used for these loop - see [StringConcatFactory](https://docs.oracle.com/javase/9/docs/api/java/lang/invoke/StringConcatFactory.html) – user85421 Jan 17 '20 at 23:55
  • 1
    or [How is String concatenation implemented in Java 9?](https://stackoverflow.com/q/46512888/85421) – user85421 Jan 18 '20 at 00:06
  • @user85421 Thanks for pointing that out; I've put a note at the top of the answer. It doesn't seem like it's going to be easy figuring out what copying is done in and out of the buffer in newer versions of Java. In principle, it could be done without a temporary buffer so only `N + 1 + s.length()` characters would be copied to the new string directly; but I can't find any information about the implementation details to check how it is actually done. – kaya3 Jan 18 '20 at 00:09