13

I know of two ways of replacing all occurrences of substring in a string.

The regex way (assuming "substring-to-be-replaced" doesn't include regex special chars):

String regex = "substring-to-be-replaced" + "+";
Pattern scriptPattern = Pattern.compile(regex);
Matcher matcher = scriptPattern.matcher(originalstring);
newstring = matcher.replaceAll("replacement-substring");

The String.replace() way:

newstring = originalstring.replace("substring-to-be-replaced", "replacement-substring");

Which of the two is more efficient (and why)?

Are there more efficient ways than the above described two?

Regex Rookie
  • 10,432
  • 15
  • 54
  • 88
  • 2
    what do you mean by efficient? Less memory drain? Less processor time? – Vladimir Ivanov Mar 23 '11 at 15:27
  • Considering that `regex` would work out to `substring-to-be-replaced+`, which only allows for multiple `d`s at the end, they don't even do anywhere near the same thing. – user Mar 23 '11 at 15:28
  • @Vladimir Ivanov Faster. – Regex Rookie Mar 23 '11 at 15:31
  • @Regex Rookie. So "less processor time"? – user Mar 23 '11 at 15:32
  • 1
    @Michael Kjörling You are asking a good question. Once upon a time "faster" translated directly to "less processor time". Is it still the case? – Regex Rookie Mar 23 '11 at 15:34
  • Well, intuitively, "faster" translates to "with all else identical, less wall-clock time". Same hardware, same input data, etc. That would seem to imply less processor time. Of course, all else is never identical... – user Mar 23 '11 at 15:43
  • I humbly suggest that the second is often much more efficient than the first, because it takes much less wall-clock time to write, and less to read, understand, and maintain. – Theodore Murdock Feb 04 '14 at 19:08

5 Answers5

16

String.replace() uses regex underneath.

public String replace(CharSequence target, CharSequence replacement) {
      return Pattern.compile(target.toString(), Pattern.LITERAL)
             .matcher(this ).replaceAll(
               Matcher.quoteReplacement(replacement.toString()));
  }

Are there more efficient ways than the above described two?

There are given that you operate on an implementation backed e.g., by an array, rather than the immutable String class (since string.replace creates a new string on each invocation). See for instance StringBuilder.replace().

Compiling a regex incurs quite alot of overhead which is clear when observing the Pattern source code. Luckily, Apache offers an alternative approach in StringUtils.replace() which according to the source code (line #3732) is quite efficient.

Johan Sjöberg
  • 47,929
  • 21
  • 130
  • 148
  • The String implementation you point to uses a `Pattern` for the replace method in question. – Jeremy Mar 23 '11 at 15:33
  • @Johan Sjöberg StringBuilder.replace() is not the same as String.replace() because it accepts start & end indexes instead of the "string-to-be-substitued". – Regex Rookie Mar 23 '11 at 16:33
  • @Regex Rookie, precisely, that's part of what makes it more efficient – Johan Sjöberg Mar 23 '11 at 16:41
  • @Johan Sjöberg But I like the magic (convenience) of simply quoting the string to be substituted. :) Using StringBuilder.replace() will force me to use the regex approach (which is used anyway, as @Jeremy Heiler pointed out). – Regex Rookie Mar 23 '11 at 16:45
  • 1
    @Regex Rookie, Conveniance is good. Don't seek to optimize unless duly justified. Also, see my update for a great alternative. – Johan Sjöberg Mar 23 '11 at 16:58
  • @Johan Sjöberg Wow. Thanks for the impressive tweaking of your answer. This is remarkable and very educating. – Regex Rookie Mar 23 '11 at 17:03
  • @Johan, StringBuilder.reaplce **IS NOT** efficient. It involves partial copy of the content (esp. if you do replace in the beginning, it's not good). If you want efficiency you need to build the string from get go, either via StringBuilder (but no replace/insert) or simple char[]. – bestsss Mar 23 '11 at 17:06
  • @bestsss, compared to compiling a regex, it doesn't necessarily seem [that awful](http://www.java2s.com/Open-Source/Java-Document/6.0-JDK-Core/lang/java/lang/AbstractStringBuilder.java.htm) – Johan Sjöberg Mar 23 '11 at 17:11
  • @Johan, I know very well how it's implemented. With relatevely long StringBuffer replacing at the beginning will be less efficient than RegExp. If you see my answer I am still quite clueless why Sun went RegExp route for replaceAll. It takes like 15-20 lines of code to implement a proper function. Partly StringBuffer lost its charm in java1.5 since it always copies the char[] (but at least doesn't waste memory any more). Also adding StringBuilder was properly relatively dumb idea instead of relying on the JVM and simplistic escape analysis. – bestsss Mar 23 '11 at 17:39
  • Apache StringUtils.replace() is a good shout, Much faster than String.replace() – Lawrence Tierney Jun 18 '19 at 11:24
2

Here's the source code from openjdk:

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
       this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}
Jeremy
  • 22,188
  • 4
  • 68
  • 81
  • 1
    Therefore, it turns out, if one has to replace the same string often, one will be better off with replaceAll using a precompiled Regex. – Ingo Mar 23 '11 at 16:15
1

Instead of using strings, which are immutable, use char arrays or some other mutable type (such as StringBuffer or StringBuilder).

David Weiser
  • 5,190
  • 4
  • 28
  • 35
  • Thanks for this tip. Can you elaborate more on this? I know that originalstring doesn't get modified when using replace(). Can you provide an example? – Regex Rookie Mar 23 '11 at 15:30
  • try using a [StringBuffer](http://download.oracle.com/javase/1.4.2/docs/api/java/lang/StringBuffer.html). – David Weiser Mar 23 '11 at 15:32
  • 1
    A `StringBuilder` would be more appropriate in non-concurrent situations. – Jeremy Mar 23 '11 at 15:34
1

Not having done any profiling or benchmarking, I'd say it's a fairly safe bet that if you don't need regex magic, then the overhead of the regular expression parser (which you'll get no matter what, in terms of memory as well as CPU usage) costs you a lot more than you can possibly gain on the other end.

user
  • 6,897
  • 8
  • 43
  • 79
0

Shouldn't you compare replaceAll 2 times? However, for a single invocation it will hardly be measurable. And will you do millions of comparisions?

Then I would expect 'compile' to be faster, but only, if you don't use a constant String without any pattern-rules.

Where is the problem in writing a micro benchmark? Or look up the source.

user unknown
  • 35,537
  • 11
  • 75
  • 121