10

Say I have a file, that contains some text. There are substrings like "substr1", "substr2", "substr3" etc. in it. I need to replace all of those substrings with some other text, like "repl1", "repl2", "repl3". In Python, I would create a dictionary like this:

{
 "substr1": "repl1",
 "substr2": "repl2",
 "substr3": "repl3"
}

and create the pattern joining the keys with '|', then replace with re.sub function. Is there a similar simple way to do this in Java?

Boann
  • 48,794
  • 16
  • 117
  • 146
Andrii Yurchuk
  • 3,090
  • 6
  • 29
  • 40

5 Answers5

13

This is how your Python-suggestion translates to Java:

Map<String, String> replacements = new HashMap<String, String>() {{
    put("substr1", "repl1");
    put("substr2", "repl2");
    put("substr3", "repl3");
}};

String input = "lorem substr1 ipsum substr2 dolor substr3 amet";

// create the pattern joining the keys with '|'
String regexp = "substr1|substr2|substr3";

StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(input);

while (m.find())
    m.appendReplacement(sb, replacements.get(m.group()));
m.appendTail(sb);


System.out.println(sb.toString());   // lorem repl1 ipsum repl2 dolor repl3 amet

This approach does a simultanious (i.e. "at once") replacement. I.e., if you happened to have

"a" -> "b"
"b" -> "c"

then this approach would give "a b" -> "b c" as opposed to the answers suggesting you should chain several calls to replace or replaceAll which would give "c c".


(If you generalize this approach to create the regexp programatically, make sure you Pattern.quote each individual search word and Matcher.quoteReplacement each replacement word.)

palacsint
  • 28,416
  • 10
  • 82
  • 109
aioobe
  • 413,195
  • 112
  • 811
  • 826
  • How does this approach differ from StringUtils.replaceEach? Or is replaceEach the same as replaceAll? – Andrii Yurchuk Oct 05 '11 at 13:05
  • This approach is more general as you can provide an arbitrary replacement-function (look at the `m.appendReplacement` line). Secondly it doesn't required you to include a third party library for the sake of a string-manipulation routine. (If you already depend on the Apache Commons, or don't bother at all with another dependency, then go with the `replaceEach` approach.) – aioobe Oct 05 '11 at 13:07
  • (No, `replaceEach` is not the same as `replaceAll`. `replaceAll` is just a regexp-version of `replace`.) – aioobe Oct 05 '11 at 13:09
  • Does replaceEach play with those "a b" -> "b c" the same as your solution? – Andrii Yurchuk Oct 05 '11 at 13:30
  • 1
    Your solution sounds better than replaceEach, however, I still need to find a way to join the keys of a map to create the pattern programatically. The simplest way I see to do this is to use join() from StringUtils :) – Andrii Yurchuk Oct 05 '11 at 13:51
  • Heheh good point. I'm not sure join will help you though. You need to quote the keys / values. (See my last note.) Unless you find a way to do with without loops, you could just as well join the stings while performing the quoting :-) – aioobe Oct 05 '11 at 13:55
  • "join the stings while performing the quoting" What do you mean? – Andrii Yurchuk Oct 05 '11 at 14:03
  • 1
    For my approach to work, you need to use `Pattern.quote` on the keys. In other words, you'll have to loop over the keys either way. (StringUtils.join won't save you from looping!) – aioobe Oct 05 '11 at 14:07
  • `StringBuffer` is synchronized, in this case it's better to use `StringBuilder` – Duarte Meneses Mar 17 '16 at 15:17
  • Please avoid using `new HashMap<>() {{ /*puts*/ }};` in answers. A lot of people do not know the the adverse negative side-effect of this method of Map initialization in frequently executed code. – Czar Mar 13 '17 at 14:08
  • @Czar, what adverse negative side-effects are you talking about? – aioobe Mar 13 '17 at 14:18
  • @aioobe, short summary: https://dzone.com/articles/double-brace-initialization Or just google `double brace initialization antipattern` for more info – Czar Mar 13 '17 at 15:12
  • I don't think an extra .class file matters much here. (You already have them everywhere for anonymous listener implementations etc.) I don't really get the argument about lacking support for diamond operator. Double-brace initialization still saves characters, right? Also, diamond operators for anonymous classes is supported in Java 9. – aioobe Mar 13 '17 at 15:42
7

StringUtils.replaceEach in the Apache Commons Lang project, but it works on Strings.

turbanoff
  • 2,439
  • 6
  • 42
  • 99
palacsint
  • 28,416
  • 10
  • 82
  • 109
2

First, a demonstration of the problem:

String s = "I have three cats and two dogs.";
s = s.replace("cats", "dogs")
    .replace("dogs", "budgies");
System.out.println(s);

This is intended to replace cats => dogs and dogs => budgies, but the sequential replacement operates on the result of the previous replacement, so the unfortunate output is:

I have three budgies and two budgies.

Here's my implementation of a simultaneous replacement method. It's easy to write using String.regionMatches:

public static String simultaneousReplace(String subject, String... pairs) {
    if (pairs.length % 2 != 0) throw new IllegalArgumentException(
        "Strings to find and replace are not paired.");
    StringBuilder sb = new StringBuilder();
    int numPairs = pairs.length / 2;
    outer:
    for (int i = 0; i < subject.length(); i++) {
        for (int j = 0; j < numPairs; j++) {
            String find = pairs[j * 2];
            if (subject.regionMatches(i, find, 0, find.length())) {
                sb.append(pairs[j * 2 + 1]);
                i += find.length() - 1;
                continue outer;
            }
        }
        sb.append(subject.charAt(i));
    }
    return sb.toString();
}

Testing:

String s = "I have three cats and two dogs.";
s = simultaneousReplace(s,
    "cats", "dogs",
    "dogs", "budgies");
System.out.println(s);

Output:

I have three dogs and two budgies.

Additionally, it is sometimes useful when doing simultaneous replacement, to make sure to look for the longest match. (PHP's strtr function does this, for example.) Here is my implementation for that:

public static String simultaneousReplaceLongest(String subject, String... pairs) {
    if (pairs.length % 2 != 0) throw new IllegalArgumentException(
        "Strings to find and replace are not paired.");
    StringBuilder sb = new StringBuilder();
    int numPairs = pairs.length / 2;
    for (int i = 0; i < subject.length(); i++) {
        int longestMatchIndex = -1;
        int longestMatchLength = -1;
        for (int j = 0; j < numPairs; j++) {
            String find = pairs[j * 2];
            if (subject.regionMatches(i, find, 0, find.length())) {
                if (find.length() > longestMatchLength) {
                    longestMatchIndex = j;
                    longestMatchLength = find.length();
                }
            }
        }
        if (longestMatchIndex >= 0) {
            sb.append(pairs[longestMatchIndex * 2 + 1]);
            i += longestMatchLength - 1;
        } else {
            sb.append(subject.charAt(i));
        }
    }
    return sb.toString();
}

Why would you need this? Example follows:

String truth = "Java is to JavaScript";
truth += " as " + simultaneousReplaceLongest(truth,
    "Java", "Ham",
    "JavaScript", "Hamster");
System.out.println(truth);

Output:

Java is to JavaScript as Ham is to Hamster

If we had used simultaneousReplace instead of simultaneousReplaceLongest, the output would have had "HamScript" instead of "Hamster" :)

Note that the above methods are case-sensitive. If you need case-insensitive versions it is easy to modify the above because String.regionMatches can take an ignoreCase parameter.

Boann
  • 48,794
  • 16
  • 117
  • 146
2
yourString.replace("substr1", "repl1")
          .replace("substr2", "repl2")
          .replace("substr3", "repl3");
aioobe
  • 413,195
  • 112
  • 811
  • 826
Eng.Fouad
  • 115,165
  • 71
  • 313
  • 417
  • 4
    +1... That's not "all at once" though. If the the example was different, say `"a" -> "b"` and `"b" -> "c"` then there would be no `b`s in the result, even though there were `a`s in the input. – aioobe Oct 05 '11 at 12:49
  • @aioobe: `StringUtils.replaceEach()` handles this well. – palacsint Oct 05 '11 at 12:51
-1
    return yourString.replaceAll("substr1","relp1").
                     replaceAll("substr2","relp2").
                     replaceAll("substr3","relp3")
Balconsky
  • 2,234
  • 3
  • 26
  • 39
  • -1. This isn't all at once, and unnecessarily uses a regex method (replaceAll) instead of the plain String method (replace). – Boann Nov 27 '13 at 04:11