1

I want to build index for my program and one of the most important step is to normalize text. e.g. I need to convert "[(Mac Pro @apple)]" to "macproapple", in which I filter blank space, punctuations([()]) and special chars(@). My code is like this:

StringBuilder sb = new StringBuilder(text);
sb = filterPunctuations(sb);
sb = filterSpecialChars(sb);
sb = filterBlankSpace(sb);
sb = toLower(sb);

Because this will generate a lot of String objects, I decide to use StringBuilder. But I don't know how to do it with StringBuffer. Does any one has some suggestions? I also need to handle chinese characters.

remy
  • 1,255
  • 6
  • 20
  • 27

2 Answers2

2

You can use replaceAll api with a regular expression

String originalText = "[(Mac Pro @apple)]";
String removedString = originalText.replaceAll("[^\\p{L}\\p{N}]", "").toLowerCase();

Internally replaceAll method uses StringBuffer so you need not worry on multiple objects created in memory.

Here is code for replaceAll in Matcher class

 public String replaceAll(String replacement) {
        reset();
        boolean result = find();
        if (result) {
            StringBuffer sb = new StringBuffer();
            do {
                appendReplacement(sb, replacement);
                result = find();
            } while (result);
            appendTail(sb);
            return sb.toString();
        }
        return text.toString();
    }
Srinivas M.V.
  • 6,508
  • 5
  • 33
  • 49
1

Try this-

class Solution
{
        public static void main (String[] args)
        {
                String s = "[(Mac Pro @apple)]";
                s = s.replaceAll("[^A-Za-z]", "");
                System.out.println(s);
        }
}

This gives the output of

MacProapple

A small explanation for above lines is-

s.replaceAll("[^A-Za-z]", "") removes everything in the string that is not(denoted by ^) in A-Z and a-z. Regex in Java is explained here.

If you want to convert the string to lowercase at the end, you need to use s.toLowerCase().

sgowd
  • 2,242
  • 22
  • 29
  • thank you,I think I would use String if I can't find a solution for using StringBuffer – remy Apr 24 '12 at 06:02
  • 3
    You're wrong. In Java a String object is immutable. Each time you change a String (for example replaceAll()), a new String object is created. – j0ntech Apr 24 '12 at 06:03