51

When I compared performance of Apache's StringUtils.replace() vs String.replace() I was surprised to know that the former is about 4 times faster. I used Google's Caliper framework to measure performance. Here's my test

public class Performance extends SimpleBenchmark {
    String s = "111222111222";

    public int timeM1(int n) {
        int res = 0;
        for (int x = 0; x < n; x++) {
            res += s.replace("111", "333").length();
        }
        return res;
    }

    public int timeM2(int n) {
        int res = 0;
        for (int x = 0; x < n; x++) {
            res += StringUtils.replace(s, "111", "333", -1).length();
        }
        return res;
    }

    public static void main(String... args) {
        Runner.main(Performance.class, args);
    }
}

output

 0% Scenario{vm=java, trial=0, benchmark=M1} 9820,93 ns; ?=1053,91 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M2} 2594,67 ns; ?=58,12 ns @ 10 trials

benchmark   us linear runtime
       M1 9,82 ==============================
       M2 2,59 =======

Why is that? Both methods seem to do the same work, StringUtils.replace() is even more flexible.

Prizoff
  • 4,486
  • 4
  • 41
  • 69
Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275

5 Answers5

64

In modern Java, this is not the case anymore. String.replace was improved in Java-9 moving from regular expression to StringBuilder, and was improved even more in Java-13 moving to direct allocation of the target byte[] array calculating its exact size in advance. Thanks to internal JDK features used, like the ability to allocate an uninitialized array, ability to access String coder and ability to use private String constructor which avoids copying, it's unlikely that current implementation can be beaten by a third-party implementation.

Here are my benchmarking results for your test using JDK 8, JDK 9 and JDK 13 (caliper:0.5-rc1; commons-lang3:3.9)

Java 8 (4x slower indeed):

 0% Scenario{vm=java, trial=0, benchmark=M1} 291.42 ns; σ=6.56 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M2} 70.34 ns; σ=0.15 ns @ 3 trials

benchmark    ns linear runtime
       M1 291.4 ==============================
       M2  70.3 =======

Java 9 (almost equal performance):

 0% Scenario{vm=java, trial=0, benchmark=M2} 99,15 ns; σ=8,34 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M1} 103,43 ns; σ=9,01 ns @ 10 trials

benchmark    ns linear runtime
       M2  99,1 ============================
       M1 103,4 ==============================

Java 13 (standard method is 38% faster):

 0% Scenario{vm=java, trial=0, benchmark=M2} 91,64 ns; σ=5,12 ns @ 10 trials
50% Scenario{vm=java, trial=0, benchmark=M1} 57,38 ns; σ=2,51 ns @ 10 trials

benchmark   ns linear runtime
       M2 91,6 ==============================
       M1 57,4 ==================
Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • 2
    My first thought was, you gain for simple cases, but lose for more complex scenarios where the regex engine’s use of the Boyer–Moore algorithm would pay off, but then, I noticed [your other answer](https://stackoverflow.com/a/33907329/2711488) stating that `indexOf(String)` is an intrinsic operation. Now, it would be interesting whether being called from `String.replace` counts as a use with a dynamic argument or whether the caller’s caller is taken into account here… – Holger Apr 20 '20 at 16:23
44

From the source code of java.lang.String1:

public String replace(CharSequence target, CharSequence replacement) {
   return Pattern
            .compile(target.toString(), Pattern.LITERAL)
            .matcher(this )
            .replaceAll(
                    Matcher.quoteReplacement(replacement.toString()));
}

String.replace(CharSequence target, CharSequence replacement) is implemented with java.util.regex.Pattern, therefore, it is not surprising that it is slower that StringUtils.replace(String text, String searchString, String replacement)2, which is implemented with indexOf and StringBuffer.

public static String replace(String text, String searchString, String replacement) {
    return replace(text, searchString, replacement, -1);
}

public static String replace(String text, String searchString, String replacement, int max) {
    if (isEmpty(text) || isEmpty(searchString) || replacement == null || max == 0) {
        return text;
    }
    int start = 0;
    int end = text.indexOf(searchString, start);
    if (end == -1) {
        return text;
    }
    int replLength = searchString.length();
    int increase = replacement.length() - replLength;
    increase = (increase < 0 ? 0 : increase);
    increase *= (max < 0 ? 16 : (max > 64 ? 64 : max));
    StringBuffer buf = new StringBuffer(text.length() + increase);
    while (end != -1) {
        buf.append(text.substring(start, end)).append(replacement);
        start = end + replLength;
        if (--max == 0) {
            break;
        }
        end = text.indexOf(searchString, start);
    }
    buf.append(text.substring(start));
    return buf.toString();
}

Footnote

1 The version that I links to and copied source code from is JDK 7

2 The version that I links to and copied source code from is common-lang-2.5

nhahtdh
  • 55,989
  • 15
  • 126
  • 162
  • 4
    Surprising that they use `StringBuffer`. It would almost certainly be even faster with `StringBuilder` in its place. – pickypg Apr 26 '13 at 05:20
  • 1
    @pickypg: Probably for compatibility reason? StringBuffer is available from Java 1.0, while StringBuilder is only available from Java 1.5. I think they do it for compatibility reason in old releases and didn't change it for the new release (where they raised the requirement to 1.5 and later 1.6). – nhahtdh Apr 26 '13 at 05:22
  • You appear to be right. The legacy version (2.6) supports Java 1.2. It wasn't until 3.1 they required Java 1.5. In the latest, they have [changed over to `StringBuilder`s](http://svn.apache.org/viewvc/commons/proper/lang/trunk/src/main/java/org/apache/commons/lang3/StringUtils.java?view=markup#l4396). Thought they required 1.5 since version 2.x. – pickypg Apr 26 '13 at 05:26
  • 4
    In many cases biased locking means the practical difference between `StringBuilder` and `StringBuffer` is close to nil, unless you are appending one character at a time or something. – BeeOnRope Apr 29 '13 at 05:25
  • @BeeOnRope: Right. From my testing when I commented on [Stephen C's post](http://stackoverflow.com/questions/16228992/commons-lang-stringutils-replace-performance-vs-string-replace/16229133?noredirect=1#comment23216562_16229071) (the test is not included here), there is very minor difference (that is not distinguishable from noise) between StringBuilder and StringBuffer's timing. – nhahtdh Apr 29 '13 at 05:32
  • 1
    It's interesting that the HotSpot JVM and JDK guys found two completely different solutions to the mistake of introducing basic classes like `Vector` and `StringBuffer` being synchronized by default - the first to introduce new classes in the JDK without synchronization - which requires end-user effort to update, but then a few years later the "magic" of biased locking comes along and means those who didn't convert mostly don't pay a large penalty. Even the underlying atomic instructions have been getting much faster on x86 for the cases where biased locking doesn't kick in. – BeeOnRope Apr 29 '13 at 05:37
  • It might be worth noting that `String.replace(char, char)` does not use regex in its implementation – RTF Dec 11 '14 at 13:59
  • @RTF: Yes, but it is irrelevant to the issue here (string replacement). – nhahtdh Dec 11 '14 at 14:02
  • 6
    Starting with Java 9, String.replace will be faster than before, see https://bugs.openjdk.java.net/browse/JDK-8058779. – Roland Illig Apr 24 '17 at 08:34
  • @RolandIllig: Yes, the new implementation should be faster for most practical cases. However, I think the new implementation will have quadratic complexity on worst case, while the regex implementation won't have this problem. – nhahtdh Apr 25 '17 at 01:58
11

Try this one, you'll notice that it's extremely performant than Apache's one:

public static String replace (String source, String os, String ns) {
    if (source == null) {
        return null;
    }
    int i = 0;
    if ((i = source.indexOf(os, i)) >= 0) {
        char[] sourceArray = source.toCharArray();
        char[] nsArray = ns.toCharArray();
        int oLength = os.length();
        StringBuilder buf = new StringBuilder (sourceArray.length);
        buf.append (sourceArray, 0, i).append(nsArray);
        i += oLength;
        int j = i;
        // Replace all remaining instances of oldString with newString.
        while ((i = source.indexOf(os, i)) > 0) {
            buf.append (sourceArray, j, i - j).append(nsArray);
            i += oLength;
            j = i;
        }
        buf.append (sourceArray, j, sourceArray.length - j);
        source = buf.toString();
        buf.setLength (0);
    }
    return source;
}
loukili
  • 111
  • 1
  • 2
  • 2
    It is slightly faster than Apache ones in short text replacement, but not necessarily faster than other implementations in long text replacement senario. Checkout https://github.com/greenlaw110/Benchmark4StringReplace in which Fast implementation is done using your function above – Gelin Luo Aug 14 '17 at 22:15
6

on my test with JMH:https://github.com/qxo/Benchmark4StringReplace The beset is loukili's way:

java -jar target/benchmarks.jar StringReplaceBenchmark -wi 3 -i 6 -f 1 -tu ms Benchmark Mode Cnt Score Error Units StringReplaceBenchmark.test4String thrpt 6 1255.017 ± 230.012 ops/ms StringReplaceBenchmark.test4StringUtils thrpt 6 4068.229 ± 67.708 ops/ms StringReplaceBenchmark.test4fast thrpt 6 4821.035 ± 97.790 ops/ms StringReplaceBenchmark.test4lang3StringUtils thrpt 6 3186.007 ± 102.786 ops/ms

qxo
  • 1,584
  • 15
  • 11
  • 1
    good project. I've cloned your project, added OSGL implementation and more test scenarios. I've submitted a PR to you – Gelin Luo Aug 14 '17 at 22:16
3

Why is that? Both methods seem to do the same work.

You would need to look at the source-code and do some serious investigation with a profiler to get a good (technical) answer to that.

However, one possible explanation is that StringUtils.replace and String.replace have been tuned for different use-cases. You are only looking at one case ... with a pretty small string, and a replacement string that is the same size as the substring being replaced.

Another possible explanation is that the Apache developers simply spent more time on tuning. (And lets not blame the Java developers for that. They have been working under severe staffing constraints for a long time. In the big scheme of things, there are many tasks more important than performance tuning String.replace.)


In fact, looking at the source code, it looks like the Java 7 version just uses the regular expression-based replace under the hood. By contrast, the Apache version is going to considerable lengths to avoid copying. Based on that evidence, I'd expect the performance difference between the two versions to be relatively smaller for large target strings. And I suspect the Java 7 version might even be better in some cases.

(Either non-technical explanation is plausible too, based on the evidence in the code.)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • `Based on that evidence, I'd expect the performance difference between the two versions to be relatively smaller for large target strings.` There are several factors affecting the time of a replacement function: length of input, length of target string, length of replacement string, number of matches. Here, you only mention about the size of string, but how about the other parameters (esp. number of matches)? – nhahtdh Apr 26 '13 at 06:36
  • You'd need to look at how Matcher.replaceAll works. Personally, I'd expect the *number* of matches to not be that significant. I'd be more concerned about whether the replacement process required expansion of a StringBuilder or not. And that depends on whether the replacement increases the string length ... or not. (But I haven't looked ... ) – Stephen C Apr 26 '13 at 07:12
  • 1
    I haven't looked at the details, but from testing, `StringUtils` outperforms `String.replace` by 2-3 times for random string (I varied the number of matches, and also the length of the target string, the replacement string is allowed to be slightly shorter or longer than the target string). – nhahtdh Apr 26 '13 at 07:16