8

I saw this code today

if (translatedText.contains("â")) translatedText = translatedText.replace("â", "a");
if (translatedText.contains("ê")) translatedText = translatedText.replace("ê", "e");
...

There are 22 lines like that and I was wondering what is the point of using "ifs" before a replace. The way I understand it works, we are reading the string twice per line, while calling the replace method directly would have the same effect when there is nothing to replace and it would faster when there is something to replace.

But that is only how I guess it works. Can someone confirm or correct me?

And a second question. We are doing that replace for every vowel and for every symbol "á", "à", "â" and "ä". I bet there is a better way to do that in Java. Any suggestions?

Thanks.

Gonzalo Calvo
  • 304
  • 4
  • 15
  • Why are you doing this? I'm pretty sure the requirement behind it is a mistake - I'm not trying to judge or anything, I just believe we can help yoiu on a deeper level – Sebas Dec 19 '16 at 11:37
  • 1
    @Sebas I think, it isn't his code - he just met it and wonder is ifs are required – Michel_T. Dec 19 '16 at 11:40
  • If an operation in a code is perform after condition check then it perform best too,otherwise it run anyways. – Mohit Dixit Dec 19 '16 at 11:48
  • for `if`condition you need to look at your data length and frequency of your invalid characters , if length is relatively big then i would suggest to directly go for replace – Pavneet_Singh Dec 19 '16 at 11:51
  • Surprising accept; but you'r the boss man ;-) – GhostCat Dec 19 '16 at 12:54

4 Answers4

7

The documentation doesn't tell us what replace will do if there's no matching substring, but looking at the current implementation in Oracle's version (Java 8):

public String replace(CharSequence target, CharSequence replacement) {
    return Pattern.compile(target.toString(), Pattern.LITERAL).matcher(
            this).replaceAll(Matcher.quoteReplacement(replacement.toString()));
}

...it does look like you avoid some work and in particular memory allocations (the matcher) if you check first.

Which isn't to say that there isn't likely a better way to approach those 22 replacements, probably by using a single regular expression with a character class ([âê] and so on), compiling that regex once, and then using a single matcher in a loop, very roughly like this (inspired by this answer):

// You can do this part once somewhere if you want to
Pattern regex = Pattern.compile("[âê]");
// Then:
StringBuffer resultString = new StringBuffer();
Matcher regexMatcher = regex.matcher(translatedText);
while (regexMatcher.find()) {
    String match = regexMatch.group();
    String replacement;
    switch (match) {
        // ...various cases setting `replacement`
    }
    regexMatcher.appendReplacement(resultString, replacement);
}
regexMatcher.appendTail(resultString);
translatedText = resultString.toString();

or if you want to prematurely micro-optimize it (a failing of mine):

// You can do this part once somewhere if you want to
Pattern regex = Pattern.compile("[âê]");
// Then:
StringBuffer resultString = null;
Matcher regexMatcher = regex.matcher(translatedText);
while (regexMatcher.find()) {
    if (resultString == null) {
        resultString = new StringBuffer(translatedText.length() + 100);
    }
    String match = regexMatch.group();
    String replacement;
    switch (match) {
        // ...various cases setting `replacement`
    }
    regexMatcher.appendReplacement(resultString, replacement);
}
if (resultString != null) {
    regexMatcher.appendTail(resultString);
    translatedText = resultString.toString();
}
Community
  • 1
  • 1
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
3

Regarding "performance": that might really depend on the JVM version; in other words: depending on the implementation of replace() changes, having that if in place can save you some regex-matcher cost; or not.

Regarding the second question: basically you have a lot of duplicated code there. One way to work on that:

You could define some static final field, like:

Map<String, String> replacements = new HashMap<>();

To then fill with:

replacements.put("â", "a");
...

To then replace the current code with a loop that iterates the entries of that map, using each key/value as argument to a replace() call.

Or, as shown by the other answer, you do something like

replacements.put("[áàâä]", "a");

to then use replaceAll() later on.

GhostCat
  • 137,827
  • 25
  • 176
  • 248
2

You can use regex to replace all unwanted chars with your character

String s="sasaáàdaâadsasä";
System.out.println(s.replaceAll("[áàâä]", "a"));

Output :

sasaaadaaadsasa

[] mean match any occurrence of character inside this and replace if found

To replace multiple characters you can chained your replace calls and simply avoid if conditions

String s="sasaáàdaâadsêêêasä";
String str=s.replaceAll("[áàâä]", "a").replaceAll("[ê]", "e");
System.out.println(str);

Output:

sasaaadaaadseeeasa
Pavneet_Singh
  • 36,884
  • 5
  • 53
  • 68
2

If what you desire is to get rid of the apparently redundant if statements without incurring a performance penalty, the quick solution is to switch to using replace(char, char):

translatedText = translatedText.replace('â', 'a');
translatedText = translatedText.replace('ê', 'e');

This avoids regex altogether, whether explicit or hidden, and in my Java 8 also avoids making a new String if there is no replacement.

Whether there is a still better way depends on several factors including taste. A couple of the other answers have promising ideas.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161