4

I've been trying to devise a method of replacing multiple String#replaceAll calls with a Pattern/Matcher instance in the hopes that it would be faster than my current method of replacing text in a String, but I'm not sure how to go about it.

Here is an example of a String that I want to manipulate:

@bla@This is a @red@line @bla@of text.

As you can see, there are multiple @ characters with 3 characters in between; this will always be the case. If I wanted to replace every instance of '@xxx@' (where xxx can be any lowercase letter or digit from 0 to 9), what would the most efficient way to go about it be? Currently I'm storing a Map where its keys are '@xxx@' substrings, and the values are what I want to replace that specific substring with; I check if the whole String contains the '@xxx@' substring, and call a replaceAll method for each instance, but I imagine this is pretty inefficient.

Thank you very much!

TL;DR - Would a Pattern/Matcher to replace a substring of a String with a different String be more efficient than checking if the String contains the substring and using String#replaceAll? If so, how would I go about it?

Jacob G.
  • 28,856
  • 5
  • 62
  • 116
  • ’replaceAll’ is already regex; learn to use regex. – Boris the Spider Dec 31 '16 at 10:57
  • 2
    @BoristheSpider The question isn't how to write regex, but how to replace multiple different `@keyword@` patterns with values that depend on the `keyword`, without using multiple `replaceAll()` calls. The trick is the [`appendReplacement()`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#appendReplacement-java.lang.StringBuffer-java.lang.String-) and [`appendTail()`](https://docs.oracle.com/javase/8/docs/api/java/util/regex/Matcher.html#appendTail-java.lang.StringBuffer-) methods of `Matcher`. – Andreas Dec 31 '16 at 10:59
  • @Andreas Thank you, that's a very interesting way of doing it where the String only needs to be searched through once. Is there a reason why you used a StringBuffer rather than a StringBuilder? Also, wouldn't it become slightly verbose if I had ~50 different possible substrings that can be replaced? – Jacob G. Dec 31 '16 at 11:05
  • 1
    I am voting to re-open this question. I think the duplicate is pretty close, but its answers do not provide a good match for a massive number of different replacements, as in OP's example. Although the answers can definitely be re-worked, I think there is a benefit to answering this question directly. – Sergey Kalinichenko Dec 31 '16 at 11:06
  • @JacobG. It uses `StringBuffer` because that is what `appendReplacement()` requires (in Java 8). I believe a `StringBuilder` overload is added in Java 9. – Andreas Dec 31 '16 at 11:08
  • @Andreas Great! We'll have to come back to this question in 7 months then! – Jacob G. Dec 31 '16 at 11:10

2 Answers2

4

This is a more dynamic version of previous answer to another similar question.

Here is a helper method for searching for any @keyword@ you want. They don't have to be 3 characters long.

private static String replace(String input, Map<String, String> replacement) {
    StringJoiner regex = new StringJoiner("|", "@(", ")@");
    for (String keyword : replacement.keySet())
        regex.add(Pattern.quote(keyword));
    StringBuffer output = new StringBuffer();
    Matcher m = Pattern.compile(regex.toString()).matcher(input);
    while (m.find())
        m.appendReplacement(output, Matcher.quoteReplacement(replacement.get(m.group(1))));
    return m.appendTail(output).toString();
}

The above runs on Java 8+. In Java 9+, this can be done with lambda expression. The following also fixes the potential issue of a short keyword being a substring of a longer one, by sorting the keywords descending by length.

private static String replace(String input, Map<String, String> replacement) {
    String regex = replacement.keySet().stream()
            .sorted(Comparator.comparingInt(String::length).reversed())
            .map(Pattern::quote).collect(Collectors.joining("|", "@(", ")@"));
    return Pattern.compile(regex).matcher(input)
            .replaceAll(m -> Matcher.quoteReplacement(replacement.get(m.group(1))));
}

Test

Map<String,String> replacement = new HashMap<>();
replacement.put("bla", "hello,");
replacement.put("red", "world!");
replacement.put("Hold", "wait");
replacement.put("Better", "more");
replacement.put("a?b*c", "special regex characters");
replacement.put("foo @ bar", "with spaces and the @ boundary character work");

System.out.println(replace("@bla@This is a @red@line @bla@of text", replacement));
System.out.println(replace("But @Hold@, this can do @Better@!", replacement));
System.out.println(replace("It can even handle @a?b*c@ without dying", replacement));
System.out.println(replace("Keyword @foo @ bar@ too", replacement));

Output

hello,This is a world!line hello,of text
But wait, this can do more!
It can even handle special regex characters without dying
Keyword with spaces and the @ boundary character work too
Andreas
  • 154,647
  • 11
  • 152
  • 247
  • I really wish I could accept both of your answers as they both suffice, but my substrings will only be 3 characters long in between the @ symbols. However, I imagine this will definitely help someone in the future, thank you! – Jacob G. Dec 31 '16 at 11:17
  • 1
    That's nice, especially the use of `Pattern.quote(keyword)` that is very easy to miss. – Sergey Kalinichenko Dec 31 '16 at 11:19
  • 1
    @dasblinkenlight DOH! I had forgotten the even easier to miss `Matcher.quoteReplacement()`. Fixed. ;-) – Andreas Dec 31 '16 at 11:22
  • @Andreas I missed that one as well. Thanks! – Sergey Kalinichenko Dec 31 '16 at 11:29
  • 1
    That is a very nice answer. I will just add that if we could assume that `keyword` literals can have different lengths and one can *contain* another like `foo` and `foobar` then we can't allow regex builder to create regex like `foo|foobar` because `foo` will always prevent `foobar` from being matched. This could be solved by ordering literals by their length in descending order. – Pshemo Dec 31 '16 at 13:04
  • @Pshemo True, or by caller using a `LinkedHashSet` to ensure `foobar` is listed before `foo`. However, the surrounding `@` markers makes this mostly a moot point. – Andreas Dec 31 '16 at 18:00
2

This is a relatively straightforward case for appendReplacement:

// Prepare map of replacements
Map<String,String> replacement = new HashMap<>();
replacement.put("bla", "hello,");
replacement.put("red", "world!");
// Use a pattern that matches three non-@s between two @s
Pattern p = Pattern.compile("@([^@]{3})@");
Matcher m = p.matcher("@bla@This is a @red@line @bla@of text");
StringBuffer sb = new StringBuffer();
while (m.find()) {
    // Group 1 captures what's between the @s
    String tag = m.group(1);
    String repString = replacement.get(tag);
    if (repString == null) {
        System.err.println("Tag @"+tag+"@ is unexpected.");
        continue;
    }
    // Replacement could have special characters, e.g. '\'
    // Matcher.quoteReplacement() will deal with them correctly:
    m.appendReplacement(sb, Matcher.quoteReplacement(repString));
}
m.appendTail(sb);
String result = sb.toString();

Demo.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • As my String will only have 3 characters in between the @ characters, this is exactly what I was looking for, thank you! – Jacob G. Dec 31 '16 at 11:15