7

I'm learning Java and am wondering what's the best way to modify strings here (both for performance and to learn the preferred method in Java). Assume you're looping through a string and checking each character/performing some action on that index in the string.

Do I use the StringBuilder class, or convert the string into a char array, make my modifications, and then convert the char array back to a string?

Example for StringBuilder:

StringBuilder newString = new StringBuilder(oldString);
for (int i = 0; i < oldString.length() ; i++) {
    newString.setCharAt(i, 'X');    
}

Example for char array conversion:

char[] newStringArray = oldString.toCharArray();
for (int i = 0; i < oldString.length() ; i++) {
    myNameChars[i] = 'X';    
}    
myString = String.valueOf(newStringArray);

What are the pros/cons to each different way?

I take it that StringBuilder is going to be more efficient since the converting to a char array makes copies of the array each time you update an index.

Adam Stelmaszczyk
  • 19,665
  • 4
  • 70
  • 110
user797963
  • 2,907
  • 9
  • 47
  • 88
  • 6
    Please use `StringBuilder`. – Sotirios Delimanolis Oct 09 '13 at 18:16
  • 1
    If I'm not mistaken, `StringBuilder` uses a `char` array internally (similar to yours), so I'd recommend letting the ready-made class do the heavy lifting for you. – Reinstate Monica -- notmaynard Oct 09 '13 at 18:21
  • There are almost never universal answers for questions like this. The best answer is ... "it depends on what you're doing." Strings are immutable and so they are always threadsafe. If you are doing a string manipulation in a concurrent environment on shared state variables, then it may be greatly to your advantage to use String. For thread confined mutable state, as others have pointed out, StringBuilder will get you superior performance. For shared mutable state, StringBuilder will need to be guarded by a lock before a thread can modify it and this can cause performance bottlenecks. – scottb Oct 09 '13 at 18:44

4 Answers4

4

I say do whatever is most readable/maintainable until you you know that String "modification" is slowing you down. To me, this is the most readable:

Sting s = "foo";
s += "bar";
s += "baz";

If that's too slow, I'd use a StringBuilder. You may want to compare this to StringBuffer. If performance matters and synchronization does not, StringBuilder should be faster. If sychronization is needed, then you should use StringBuffer.

Also it's important to know that these strings are not being modified. In java, Strings are immutable.


This is all context specific. If you optimize this code and it doesn't make a noticeable difference (and this is usually the case), then you just thought longer than you had to and you probably made your code more difficult to understand. Optimize when you need to, not because you can. And before you do that, make sure the code you're optimizing is the cause of your performance issue.

Community
  • 1
  • 1
Daniel Kaplan
  • 62,768
  • 50
  • 234
  • 356
  • 1
    Fun fact: every Java compiler I have seen implements the string concatenation operator using a `StringBuilder` anyway, so in many cases the resulting bytecode should be (more or less) identical. – Mike Strobel Oct 09 '13 at 18:18
  • 2
    @MikeStrobel I always thought it *should* do that. Has it always been the case? Why does `StringBuilder` need to exist if it's used regardless? – Daniel Kaplan Oct 09 '13 at 18:19
  • @mikestrobel I just massively improved parsing performance in one of my applications switching to stringbuilder in certain situations. – tom Oct 09 '13 at 18:22
  • There is no JVM-level instruction that represents a string concatenation. To my knowledge, it has always been emitted as a series of `StringBuilder.append()` calls (or, prior to Java 5, `StringBuffer.append()` calls). I would imagine the results are only "optimal" for concatenations which occur within the same expression, e.g., "a" + 1 + "c". Breaking the operations into separate statements may produce less than optimal results. Also, different compilers may optimize these operations more aggressively than others. – Mike Strobel Oct 09 '13 at 18:34
  • In my case Stringbuilder avoided the creation of millin s of (lengthy) strings and their cocatenation. Admittedly I was parsing thousands of csv files so its situation dependant but I cut running time by 2/3rds – tom Oct 09 '13 at 18:46
  • Yeah, any sort of string building where you are performing disjoint modifications should benefit from using a single StringBuilder. – Mike Strobel Oct 09 '13 at 18:54
1

What are the pros/cons to each different way. I take it that StringBuilder is going to be more efficient since the convering to a char array makes copies of the array each time you update an index.

As written, the code in your second example will create just two arrays: one when you call toCharArray(), and another when you call String.valueOf() (String stores data in a char[] array). The element manipulations you are performing should not trigger any object allocations. There are no copies being made of the array when you read or write an element.

If you are going to be doing any sort of String manipulation, the recommended practice is to use a StringBuilder. If you are writing very performance-sensitive code, and your transformation does not alter the length of the string, then it might be worthwhile to manipulate the array directly. But since you are learning Java as a new language, I am going to guess that you are not working in high frequency trading or any other environment where latency is critical. Therefore, you are probably better off using a StringBuilder.

If you are performing any transformations that might yield a string of a different length than the original, you should almost certainly use a StringBuilder; it will resize its internal buffer as necessary.

On a related note, if you are doing simple string concatenation (e.g, s = "a" + someObject + "c"), the compiler will actually transform those operations into a chain of StringBuilder.append() calls, so you are free to use whichever you find more aesthetically pleasing. I personally prefer the + operator. However, if you are building up a string across multiple statements, you should create a single StringBuilder.

For example:

public String toString() {
    return "{field1 =" + this.field1 + 
           ",  field2 =" + this.field2 + 
           ...
           ",  field50 =" + this.field50 + "}";
}

Here, we have a single, long expression involving many concatenations. You don't need to worry about hand-optimizing this, because the compiler will use a single StringBuilder and just call append() on it repeatedly.

String s = ...;
if (someCondition) {
    s += someValue;
}
s += additionalValue;
return s;

Here, you'll end up with two StringBuilders being created under the covers, but unless this is an extremely hot code path in a latency-critical application, it's really not worth fretting about. Given similar code, but with many more separate concatenations, it might be worth optimizing. Same goes if you know the strings might be very large. But don't just guess--measure! Demonstrate that there's a performance problem before you try to fix it. (Note: this is just a general rule for "micro optimizations"; there's rarely a downside to explicitly using a StringBuilder. But don't assume it will make a measurable difference: if you're concerned about it, you should actually measure.)

String s = "";
for (final Object item : items) {
    s += item + "\n";
}

Here, we're performing a separate concatenation operation on each loop iteration, which means a new StringBuilder will be allocated on each pass. In this case, it's probably worth using a single StringBuilder since you may not know how large the collection will be. I would consider this an exception to the "prove there's a performance problem before optimizing rule": if the operation has the potential to explode in complexity based on input, err on the side of caution.

Mike Strobel
  • 25,075
  • 57
  • 69
  • Most JVMs will do the optimization of changing concatenation to a StringBuilder and then back to a String. The important exception is for concatenations that occur within loops. In that case, it is usually important to explicitly do the string manipulations with StringBuilder. – scottb Oct 09 '13 at 18:48
1

Which option will perform the best is not an easy question.

I did a benchmark using Caliper:

                RUNTIME (NS)
array           88
builder         126
builderTillEnd  76
concat          3435

Benchmarked methods:

public static String array(String input)
{
    char[] result = input.toCharArray(); // COPYING
    for (int i = 0; i < input.length(); i++)
    {
        result[i] = 'X';
    }
    return String.valueOf(result); // COPYING
}

public static String builder(String input)
{
    StringBuilder result = new StringBuilder(input); // COPYING
    for (int i = 0; i < input.length(); i++)
    {
        result.setCharAt(i, 'X');
    }
    return result.toString(); // COPYING
}

public static StringBuilder builderTillEnd(String input)
{
    StringBuilder result = new StringBuilder(input); // COPYING
    for (int i = 0; i < input.length(); i++)
    {
        result.setCharAt(i, 'X');
    }
    return result;
}

public static String concat(String input)
{
    String result = "";
    for (int i = 0; i < input.length(); i++) 
    {
        result += 'X'; // terrible COPYING, COPYING, COPYING... same as:
                       // result = new StringBuilder(result).append('X').toString();
    }
    return result;
}

Remarks

  1. If we want to modify a String, we have to do at least 1 copy of that input String, because Strings in Java are immutable.

  2. java.lang.StringBuilder extends java.lang.AbstractStringBuilder. StringBuilder.setCharAt() is inherited from AbstractStringBuilder and looks like this:

    public void setCharAt(int index, char ch) {
        if ((index < 0) || (index >= count))
            throw new StringIndexOutOfBoundsException(index);
        value[index] = ch;
    }
    

    AbstractStringBuilder internally uses the simplest char array: char value[]. So, result[i] = 'X' is very similar to result.setCharAt(i, 'X'), however the second will call a polymorphic method (which probably gets inlined by JVM) and check bounds in if, so it will be a bit slower.

Conclusions

  1. If you can operate on StringBuilder until the end (you don't need String back) - do it. It's the preferred way and also the fastest. Simply the best.

  2. If you want String in the end and this is the bottleneck of your program, then you might consider using char array. In benchmark char array was ~25% faster than StringBuilder. Be sure to properly measure execution time of your program before and after optimization, because there is no guarantee about this 25%.

  3. Never concatenate Strings in the loop with + or +=, unless you really know what you do. Usally it's better to use explicit StringBuilder and append().

Adam Stelmaszczyk
  • 19,665
  • 4
  • 70
  • 110
0

I'd prefer to use StringBuilder class where original string is modified.

For String manipulation, I like StringUtil class. You'll need to get Apache commons dependency to use it

Charu Khurana
  • 4,511
  • 8
  • 47
  • 81