8

Possible Duplicate:
java String concatenation

How to Improve performance of this chunk of code :

public static String concatStrings(Vector strings) {
    String returnValue = "";

    Iterator iter = strings.iterator();
    while( iter.hasNext() ) {
        returnValue += (String)iter.next();
    }

    return returnValue;
}
Community
  • 1
  • 1
Rachel
  • 100,387
  • 116
  • 269
  • 365

8 Answers8

15

You might look at using StringBuilder, rather than doing += with the individual strings. Strings are immutable in Java, which means that once you create a String object, you cannot modify it. Using += on strings in a loop will result in the creation of many separate String instances, which may create performance problems. StringBuilder can concatenate Strings without having to create new instances, which may save some time, depending on the exact scenario.

Andy White
  • 86,444
  • 48
  • 176
  • 211
  • 1
    The `+=` implicitly creates a **new** String everytime. – BalusC May 05 '10 at 01:48
  • Can you put your explanation in code ? – Rachel May 05 '10 at 01:52
  • Create `StringBuilder` before loop, call `append()` method inside loop, use `toString()` after loop to get result. – BalusC May 05 '10 at 01:54
  • I can't put it in code, but it's pretty easy. Just create a new StringBuilder, and use the "append" method in your loop to add new strings to the buffer. Once you're done, just call stringBuilder.toString() to get the final result. – Andy White May 05 '10 at 01:56
8
public static String concatStrings(List<String> strings) {
    StringBuilder sb = new StringBuilder();
    for (String s : strings) {
       sb.append(s);
    }    
    return sb.toString();
}

Some remarks:

  • Use StringBuilder whenever you need to build a string in a loop
    • + is fine for simple concatenation, but horrible for incremental build
  • Whenever possible, use for-each for readability
  • java.util.Vector is synchronized; if you don't need this (costly) feature, just use an ArrayList.

Don't use raw types

  • JLS 4.8 Raw Types

    The use of raw types is allowed only as a concession to compatibility of legacy code. The use of raw types in code written after the introduction of genericity into the Java programming language is strongly discouraged. It is possible that future versions of the Java programming language will disallow the use of raw types.

  • Effective Java 2nd Edition: Item 23: Don't use raw types in new code

    If you use raw types, you lose all the safety and expressiveness benefits of generics.

See also

Community
  • 1
  • 1
polygenelubricants
  • 376,812
  • 128
  • 561
  • 623
6

As suggested by other answers, using a StringBuilder would probably be a better option.

The code given in the question will actually be compiled (with Sun's javac) to something along the line of the following:

public static String concatStrings(Vector strings) {
    String returnValue = "";

    Iterator iter = strings.iterator();
    while( iter.hasNext() ) {
        String str = (String)iter.next();

        StringBuilder sb = new StringBuilder(returnValue);
        sb.append(str);

        returnValue = sb.toString();
    }

    return returnValue;
}

The compiler will change the += string concatenation with one that uses a StringBuilder. However, the compiler will probably rewrite the code inside the loop so a new StringBuilder instance will be created on each iteration, which is not very performance-friendly.

Therefore, in this case, it would probably be a better idea to create a StringBuilder outside the loop on our own, and perform manual string concatenation:

public static String concatStrings(Vector strings) {
    StringBuidler returnValueBuilder;

    Iterator iter = strings.iterator();
    while( iter.hasNext() ) {
        returnValueBuilder.append((String)iter.next());
    }

    return returnValueBuilder.toString();
}
coobird
  • 159,216
  • 35
  • 211
  • 226
  • +1 for the fact that the compiler will optimise concatination in certain circumstances i.e. '+' isn't always evil. However in cases like the one presented in the question the overhead is not in creating new StringBuilder instances (object allocation is usually very fast) but in copying the StringBuilder buffer into the String before returning the new String (String guarauntees immutability so they can't reuse the same buffer) – CurtainDog May 05 '10 at 02:10
  • @CurtainDog - IIRC, `StringBuilder.toString()` cleverly avoids copying the character array when creating the result `String`. Take a look at the code. – Stephen C May 05 '10 at 09:19
  • @Stephen C: That's definitely not true of the latest version. In general, the Java standard library is very careful to avoid any conceivable situation whereby a String could be accidentally mutated. See http://google.com/codesearch/p?hl=en#gbgUBv1WUj4/src/share/classes/java/lang/StringBuilder.java&q=package:jdk%20file:StringBuilder.java%20toString&sa=N&cd=1&ct=rc – David May 05 '10 at 11:27
  • @David - I learned something new today! Apparently the change happened between 1.4.x and 1.5 ... http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6219959 – Stephen C May 05 '10 at 12:03
  • @Stephen C - thanks, you've made me feel quite young... all my serious java coding must've come post 1.5. Of course, I see that my initial comment is inaccurate, for 'can't reuse' read 'find it problematic to reuse' – CurtainDog May 06 '10 at 07:25
4
private static final int AVERAGE_STRING_LENGTH = 10;  // Using 10 is arbitrary

public static final String concatStrings(final Collection<String> strings) {
    if (strings == null)   return null;

    final int size = strings.size();
    if (size == 0)         return "";
    if (size == 1)         return strings.get(0);

    final StringBuilder returnValue =
        new StringBuilder(AVERAGE_STRING_LENGTH * size);

    for (String s : strings) {
        returnValue.append(s);
    }

    return returnValue.toString();
}

Perhaps going a little overboard, here's every optimization I could think of for concatStrings() - demonstrated above - some of which may not be applicable to your environment:

  • Use StringBuilder - its far more efficient for these successive concatenations
  • Use StringBuilder(int capacity) to specify the likely necessary capacity, if there's any way to anticipate it (used average size above, but other methods may be more convenient)
  • Use the Collection parameter type to allow for a more efficient data structure than Vector, which is synchronized - plus the caller has far greater flexibility (e.g. no need to copy a Set<String> to a Vector<String> just to call this method)
  • Hardcode simple cases, if they are likely (e.g. null, size 0, and size 1 cases above)
  • Use final to facilitate JIT inlining and optimization
  • Cache the size of strings, if its used multiple times. (e.g. used 3 times in the above code.)

Finally, if this operation is done very often over a large number of strings, look into Ropes for Java.

Bert F
  • 85,407
  • 12
  • 106
  • 123
1

Also if you are looking to make this faster, you could refactor the code to use ArrayList instead of Vector. ArrayList is not thread safe, so it is slightly faster than Vector (depends on the situation, may be 0% difference, may be 5% difference).

bwawok
  • 14,898
  • 7
  • 32
  • 43
1

You are creating a string every time you call +=. For example

String theString = "1"; //Makes an immutable String object "1"
theString +="2"; //Makes a new immutable String object "12"
theString +="3"; //makes a new immutable String object "123"

Using a string builder avoids this problem.

StringBuilder sb = new StringBuilder("1"); //Makes a StringBuilder object holding 1
sb.append("2"); //The same StringBuilder object now has "12" in it.
sb.append("3"); //The same StringBuidler object now has "123" in it. 
String theString = sb.toString(); //Creates a new String object with "123" in it 

Note how in the first example we made all of those intermediate strings where in the second example we only created the StringBuilder and the final String (in both examples we created "1" "2" and "3" when we used them as arguments). You can see that there are fewer objects created in the first example, and if you're doing a lot of appending to the String, you can imagine how that adds up!

JF.
  • 163
  • 1
  • 1
  • 7
1

In addition to using a StringBuilder, you can walk the list of Strings beforehand and calculate the exact size of needed for the StringBuilder. Then pass this value into the StringBuilder constructor. Note that this would fall into the category of premature optimisation but you did ask for performance... (You should look at the code for growing the StringBuilder/StringBuffer buffers, its educational)

CurtainDog
  • 3,175
  • 21
  • 17
1

Apart from using ArrayList and StringBuilder, let's consider this.

In modern computer science paradigm, space can be almost always traded for time(perhaps, it's a subjective statement). For the given scenario, with the below code, an extra space of O(N) where N = no of strings(for the new buffer that holds list.toArray()) is used. This is better than using Iterator at least, (open AbstractList.iterator()). Importantly, the time-complexity is significantly better, by computing the concatenation of two strings at once, in one iteration, thus reducing the number of iterations by half! It's something like using Dynamic Programming approach(Remember, computing Fibonacci nos using Dynamic Programming)!!

    StringBuilder sb = new StringBuilder();
    Object[] o = list.toArray();
    //For even no of Strings
    if(o.length % 2 == 0){
        concatFaster(sb, o);
    } else {
        //For odd no of Strings
        concatFaster(sb, o);
        sb.append(o[o.length-1]); // For the odd index
    }

    public static void concatFaster(StringBuilder sb, Object[] o) {
    for (int i = 0; i < o.length - 1; i+=2) {
        sb.append(o[i]).append(o[i+1]);
    }
}
Sanjeev Kulkarni N
  • 341
  • 1
  • 4
  • 14