1

Hello everyone I want to ask about memory utilization and time required for a process. I have these following code. I want to optimize my code so that it will be faster. String will take more memory any alternative for that?

public String replaceSingleToWord(String strFileText) {

    strFileText = strFileText.replaceAll("\\b(\\d+)[ ]?'[ ]?(\\d+)\"", "$1 feet $2  ");
    strFileText = strFileText.replaceAll("\\b(\\d+)[ ]?'[ ]?(\\d+)''", "$1 feet $2     inch");

    //for 23o34'
    strFileText = strFileText.replaceAll("(\\d+)[ ]?(degree)+[ ]?(\\d+)'", "$1 degree $3 second");

    strFileText = strFileText.replaceAll("(\\d+((,|.)\\d+)?)sq", " $1 sq");

    strFileText = strFileText.replaceAll("(?i)(sq. Km.)", " sqkm");
    strFileText = strFileText.replaceAll("(?i)(sq.[ ]?k.m.)", " sqkm");
    strFileText = strFileText.replaceAll("(?i)\\s(lb.)", " pound");
    //for pound
    strFileText = strFileText.replaceAll("(?i)\\s(am|is|are|was|were)\\s?:", "$1 ");
    return strFileText;
}

I think it will take more memory and time I just want to reduce the complexity.I just want reduce time and memory for process what changes i need to do.is there any alternative for replaceAll function? How this code i will minimize? so that my get faster and with low memory utilization? Thank you in advanced

Aditya
  • 225
  • 4
  • 13

4 Answers4

3

Optimization methods:

  • use Pattern.compile() for each replace. Create a class, make patterns fields, and compile the patterns only once. That way you will save a lot of time, since regex compile takes place each time you call replaceAll() and it is a very costly operation
  • use non-greedy regexes. Instead of (\\d+) use (\\d+?).
  • try to not use regexes if possible (lb.->pound)?
  • merging several regexes with the same substitutions into one - applicable to your sqkm or feet replaces
  • you could try to base your api on StringBuilder; then use addReplacement to process your text.

Moreover a dot in many of your replaces is unescaped. Dot matches any character. Use \\..

Class idea:

class RegexProcessor {
  private Pattern feet1rep = Pattern.compile("\\b(\\d+)[ ]?'[ ]?(\\d+)\"");
  // ...

  public String process(String org) {
    String mod = feet1rep.match(org).replaceAll("$1 feet $2  ");
    /...
  }
}
Dariusz
  • 21,561
  • 9
  • 74
  • 114
  • what is difference between (\\d+)and (\\d?+) any example please. – Aditya Oct 14 '13 at 12:42
  • @aditya [lazy and greedy](http://stackoverflow.com/questions/2301285/what-do-lazy-and-greedy-mean-in-the-context-of-regular-expressions) and [even more details](http://stackoverflow.com/questions/3075130/difference-between-and-for-regex/3075532#3075532) – Dariusz Oct 14 '13 at 12:43
  • But `\\d?+` is no replacement for `\\d+`. It should be `\\d++`, right? – maaartinus Oct 14 '13 at 12:55
  • @maaartinus `\\d+?`, I had a typo, corrected it some time ago – Dariusz Oct 14 '13 at 12:58
  • I see that you're corrected it, but somehow I thought you wanted [possessive](http://www.regular-expressions.info/possessive.html), which IMHO makes more sense. – maaartinus Oct 14 '13 at 13:12
  • @Dariusz You mean to say that I need to create one class for only pattern and process function contain only matcher.and when ever i need it. i wil call it ? whether it will save memory? – Aditya Oct 14 '13 at 13:13
  • @Aditya the amount of memory used will not be drastically smaller that way, if at all. Using `StringBuilder` as base for your processing will save you a lot of memory allocation, that's for sure. – Dariusz Oct 14 '13 at 13:15
1

The StringBuffer and StringBuilder classes are used when there is a necessity to make a lot of modifications to Strings of characters.

Unlike Strings objects of type StringBuffer and Stringbuilder can be modified over and over again with out leaving behind a lot of new unused objects.

The StringBuilder class was introduced as of Java 5 and the main difference between the StringBuffer and StringBuilder is that StringBuilders methods are not thread safe(not Synchronised).

It is recommended to use StringBuilder whenever possible because it is faster than StringBuffer. However if thread safety is necessary the best option is StringBuffer objects.

public class Test{

    public static void main(String args[]){
       StringBuffer sBuffer = new StringBuffer(" test");
       sBuffer.append(" String Buffer");
       System.ou.println(sBuffer);  
   }
}




public class StringBuilderDemo {
    public static void main(String[] args) {
        String palindrome = "Dot saw I was Tod";

        StringBuilder sb = new StringBuilder(palindrome);

        sb.reverse();  // reverse it

        System.out.println(sb);
    }
}

so according to your need you cal select one of tham.

Reference http://docs.oracle.com/javase/tutorial/java/data/buffers.html

SSP
  • 2,650
  • 5
  • 31
  • 49
1

Use precompiled Pattern and a loop just like Joop Eggen suggested. Group your expressions together. For example, the first two can be written like

`"\\b(\\d++) ?' ?(\\d+)(?:''|\")"`

You can go much further at the expense of readability loss. A single expression for all your replacements is possible, too.

`"\\b(\\d++) ?(?:' ?(?:(\\d+)(?:''|\")|degree ?(\\d++)|...)"`

Then you need to branch on conditions like group(2) == null. This gets very hard to maintain, but with a single loop and cleverly written regex you'll win the race. :D


what will be the regex for words like can't -> canot, shouldn't -> should not etc.

It depends how exact you want to be. The most trivial way is s.replaceAll("\\Bn't\\b", " not"). The above optimizations apply, so don't ever use replaceAll when performance matters.

A general solution could go like this

Pattern SHORTENED_WORD_PATTERN =
    Pattern.compile("\\b(ca|should|wo|must|might)(n't)\\b");

String getReplacement(String trunk) {
    switch (trunk) { // needs Java 7
        case "wo": return "will not";
        case "ca": return "cannot";
        default: return trunk + " not";
    }
}

... relevant part of the replacer loop (see [replaceAll][])

    while (matcher.find()) {
        matcher.appendReplacement(result, getReplacement(matcher.group(1)));
    }

what should i do in case of strFileText = strFileText.replace("á", "a"); strFileText = strFileText.replace("’", "\'"); strFileText = strFileText.replace("â€Â", "\'"); strFileText = strFileText.replace("ó", "o"); strFileText = strFileText.replace("é", "e"); strFileText = strFileText.replace("á", "a"); strFileText = strFileText.replace("ç", "c"); strFileText = strFileText.replace("ú", "u"); if i want to write this in one line or other way replaceEach() is better for that case

If you go for efficiency note that all the above string starts with the same character Ã. A single regex could like á|’"|... is much slower than Ã(ƒÂƒÃ‚¡|¢Â€Â™"|...) (unless the regex engine can optimize it itself, which is currently not the case).

So write a regex where all common prefixes are extracted and use

String getReplacement(String match) {
    switch (match) { // needs Java 7
        case "á": return "a";
        case "’"": return "\\";
        ...
        default: throw new IllegalArgumentException("Unexpected: " + match);
    }
}

and

    while (matcher.find()) {
        matcher.appendReplacement(result, getReplacement(matcher.group()));
    }

Maybe a HashMap might be faster than the switch above.

maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • what will be the regex for words like can't -> canot, shouldn't -> should not etc. – Aditya Oct 15 '13 at 05:07
  • what should i do in case of strFileText = strFileText.replace("á", "a"); strFileText = strFileText.replace("’", "\'"); strFileText = strFileText.replace("â€Â", "\'"); strFileText = strFileText.replace("ó", "o"); strFileText = strFileText.replace("é", "e"); strFileText = strFileText.replace("á", "a"); strFileText = strFileText.replace("ç", "c"); strFileText = strFileText.replace("ú", "u"); if i want to write this in one line or other way replaceEach() is better for that case – Aditya Oct 15 '13 at 09:22
  • @Aditya: Isn't it time for a new question? My answer is overlong already. :D – maaartinus Oct 15 '13 at 10:04
0

The regex patterns can be improved at spots_ [,.] or ? (instead [ ]?).

Use compiled static final Pattern s outside the functions.

private static final Pattern PAT = Pattern.compile("...");


StringBuffer sb = new StringBuffer();
Matcher m = PAT.matcher(strFileText);
while (m.find()) {
    m.appendReplacement(sb, "...");
}
m.appendTail(sb);
strFileText = sb.toString();

Optimisable with first testing if (m.find) before doing a new StringBuffer.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
Joop Eggen
  • 107,315
  • 7
  • 83
  • 138