Is it worth it to check if a string contains a substring before replacing it?

Question

I saw this code today

if (translatedText.contains("â")) translatedText = translatedText.replace("â", "a");
if (translatedText.contains("ê")) translatedText = translatedText.replace("ê", "e");
...

There are 22 lines like that and I was wondering what is the point of using "ifs" before a replace. The way I understand it works, we are reading the string twice per line, while calling the replace method directly would have the same effect when there is nothing to replace and it would faster when there is something to replace.

But that is only how I guess it works. Can someone confirm or correct me?

And a second question. We are doing that replace for every vowel and for every symbol "á", "à", "â" and "ä". I bet there is a better way to do that in Java. Any suggestions?

Thanks.

Why are you doing this? I'm pretty sure the requirement behind it is a mistake - I'm not trying to judge or anything, I just believe we can help yoiu on a deeper level — Sebas, Dec 19 '16 at 11:37
@Sebas I think, it isn't his code - he just met it and wonder is ifs are required — Michel_T., Dec 19 '16 at 11:40
If an operation in a code is perform after condition check then it perform best too,otherwise it run anyways. — Mohit Dixit, Dec 19 '16 at 11:48
for `if`condition you need to look at your data length and frequency of your invalid characters , if length is relatively big then i would suggest to directly go for replace — Pavneet_Singh, Dec 19 '16 at 11:51

score 7 · Answer 1 · edited May 23 '17 at 12:09

The documentation doesn't tell us what replace will do if there's no matching substring, but looking at the current implementation in Oracle's version (Java 8):

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
            this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

...it does look like you avoid some work and in particular memory allocations (the matcher) if you check first.

Which isn't to say that there isn't likely a better way to approach those 22 replacements, probably by using a single regular expression with a character class ([âê] and so on), compiling that regex once, and then using a single matcher in a loop, very roughly like this (inspired by this answer):

// You can do this part once somewhere if you want to
Pattern regex = Pattern.compile("[âê]");
// Then:
StringBuffer resultString = new StringBuffer();
Matcher regexMatcher = regex.matcher(translatedText);
while (regexMatcher.find()) {
    String match = regexMatch.group();
    String replacement;
    switch (match) {
        // ...various cases setting `replacement`
    }
    regexMatcher.appendReplacement(resultString, replacement);
}
regexMatcher.appendTail(resultString);
translatedText = resultString.toString();

or if you want to prematurely micro-optimize it (a failing of mine):

// You can do this part once somewhere if you want to
Pattern regex = Pattern.compile("[âê]");
// Then:
StringBuffer resultString = null;
Matcher regexMatcher = regex.matcher(translatedText);
while (regexMatcher.find()) {
    if (resultString == null) {
        resultString = new StringBuffer(translatedText.length() + 100);
    }
    String match = regexMatch.group();
    String replacement;
    switch (match) {
        // ...various cases setting `replacement`
    }
    regexMatcher.appendReplacement(resultString, replacement);
}
if (resultString != null) {
    regexMatcher.appendTail(resultString);
    translatedText = resultString.toString();
}

@GhostCat: :-) You're the only one who seems to have noticed. — T.J. Crowder, Dec 19 '16 at 11:46
My bad. I agree when nobody cares me hitting 25K; but its not like this site sees people reaching 500K daily. — GhostCat, Dec 19 '16 at 11:51
@GhostCat: Well, good of *you* to notice. :-) (And indeed, it seems to have happened a grand total of 12 times previously... Then there's Crowder, lucky 13!) — T.J. Crowder, Dec 19 '16 at 11:52
I hereby promise: upon making 500K, I will simply invite the whole SO for a celebration party ... but given the fact how slowly upvotes are coming in today, hmm, well, that gonna take a while. — GhostCat, Dec 19 '16 at 12:06

score 3 · Accepted Answer · answered Dec 19 '16 at 11:41

Regarding "performance": that might really depend on the JVM version; in other words: depending on the implementation of replace() changes, having that if in place can save you some regex-matcher cost; or not.

Regarding the second question: basically you have a lot of duplicated code there. One way to work on that:

You could define some static final field, like:

Map<String, String> replacements = new HashMap<>();

To then fill with:

replacements.put("â", "a");
...

To then replace the current code with a loop that iterates the entries of that map, using each key/value as argument to a replace() call.

Or, as shown by the other answer, you do something like

replacements.put("[áàâä]", "a");

to then use replaceAll() later on.

Pavneet_Singh · Answer 3 · 2016-12-23T20:54:07.273

2

You can use regex to replace all unwanted chars with your character

String s="sasaáàdaâadsasä";
System.out.println(s.replaceAll("[áàâä]", "a"));

Output :

sasaaadaaadsasa

[] mean match any occurrence of character inside this and replace if found

To replace multiple characters you can chained your replace calls and simply avoid if conditions

String s="sasaáàdaâadsêêêasä";
String str=s.replaceAll("[áàâä]", "a").replaceAll("[ê]", "e");
System.out.println(str);

Output:

sasaaadaaadseeeasa

edited Dec 23 '16 at 20:54

answered Dec 19 '16 at 11:40

Pavneet_Singh

36,884
5
53
68

2

His use case includes having "ê" as well though, so you need a mapping for each character. – runDOSrun Dec 19 '16 at 11:42
1

Would still reduce calls to one for each "target"-char @runDOSrun. – Fildor Dec 19 '16 at 11:43
1

@Fildor Correct, but it doesn't provide an alternative to the `if`s. – runDOSrun Dec 19 '16 at 11:45
1

@runDOSrun Well, Pavneet does not state this explicitly, but I guess he would ditch the `if`s. – Fildor Dec 19 '16 at 11:47

score 2 · Answer 4 · answered Dec 19 '16 at 12:36

If what you desire is to get rid of the apparently redundant if statements without incurring a performance penalty, the quick solution is to switch to using replace(char, char):

translatedText = translatedText.replace('â', 'a');
translatedText = translatedText.replace('ê', 'e');

This avoids regex altogether, whether explicit or hidden, and in my Java 8 also avoids making a new String if there is no replacement.

Whether there is a still better way depends on several factors including taste. A couple of the other answers have promising ideas.

Is it worth it to check if a string contains a substring before replacing it?

4 Answers4