2

I have text file it contains following information.My task is to remove special symbols from that text file.My input file conatins

This is sample CCNA program. it contains CCNP™.

My required output string:

This is sample CCNA program. it contains CCNP.

how to do this please suggest me.

thanks

user2609542
  • 801
  • 4
  • 13
  • 20

10 Answers10

8

This should work, "if you're looking to retain only ASCII (0-127) characters in your string":

String str = "This is sample CCNA program. it contains CCNP™";
str = str.replaceAll("[^\\x00-\\x7f]+", "");
GeekyCoder
  • 138
  • 11
anubhava
  • 761,203
  • 64
  • 569
  • 643
4

Do you want to remove all special characters from your strings? If so:

String alphaOnly = input.replaceAll("[^a-zA-Z]+","");
String alphaAndDigits = input.replaceAll("[^a-zA-Z0-9]+","");

Please see Sean Patrick Floyd's answer to a possible duplicate question.

Community
  • 1
  • 1
Stephen Lake
  • 1,582
  • 2
  • 18
  • 27
3

You can do it from a Unicode point of view:

String s = "This is sample CCNA program. it contains CCNP™. And it contains digits 123456789.";
String res = s.replaceAll("[^\\p{L}\\p{M}\\p{P}\\p{Nd}\\s]+", "");
System.out.println(res);

will print out:

This is sample CCNA program. it contains CCNP. And it contains digits 123456789.

\\p{...} is a Unicode property

\\p{L} matches all letters from all languages

\\p{M} a character intended to be combined with another character (e.g. accents, umlauts, enclosing boxes, etc.).

\\p{P} any kind of punctuation character.

\\p{Nd} a digit zero through nine in any script except ideographic scripts.

So this regex will replace every character that is not a letter (also combined letters), a Punctuation, a digit or a withespace character (\\s).

stema
  • 90,351
  • 20
  • 107
  • 135
1
 ^[\\u0000-\\u007F]*$

With this you allow only ASCCI characters, but you need to say us what is for you an special character.

Distopic
  • 717
  • 3
  • 16
  • 31
0
       String  yourString = "This is sample CCNA program. it contains CCNP™";
       String result = yourString.replaceAll("[\\™]","");       
       System.out.println(yourString);
       System.out.println(result);
Prabhakaran Ramaswamy
  • 25,706
  • 10
  • 57
  • 64
0

You can also try something like:

Normalizer.decompose(str, false, 0).replaceAll("\\p{InSuperscriptsAndSubscripts}+", "");

but you need to find proper Unicode group or groups (Unicode Blocks).

agad
  • 2,192
  • 1
  • 20
  • 32
0

You'd have to really define what special characters are in your instance.

If you are not a fan of RegEx, you could consider using some methods out of the Character class. See sample below:

public class Test {

    public static void main(String[] args) {

        String test = "This is sample CCNA program. it contains CCNP™";

        System.out.println("Character\tAlpha or Letter\tWhitespace");

        for (char c : test.toCharArray()) {
            System.out.println(
                    c + "\t\t"
                    + Character.isLetterOrDigit(c) + "\t\t" 
                    + Character.isWhitespace(c));
        }
    }
}

There are other methods that you could use in addition to the ones above. Look at the Character class API.

STM
  • 36
  • 2
0

Alternative option to regex to exclude chars > 128.

    String s = "This is sample CCNA program. it contains CCNP™";


    for (int i = 0; i < s.length(); i++) {
        if (s.charAt(i) > 128) {
            s = s.substring(0,  i) 
                    + s.substring(i + 1);
            i++;
        }
    }
art1go
  • 99
  • 1
  • 1
  • 11
0
import java.util.Scanner;

public class replacespecialchar {

    /**
     * @param args
     */
    public static void main(String[] args) {

        String before="";

        String after="";
        Scanner in =new Scanner(System.in);
        System.out.println("enter string with special char");
        before=in.nextLine();

         for (int i=0;i<before.length();i++)
          {
              if (before.charAt(i)>=65&&before.charAt(i)<=90 || before.charAt(i)>=97&&before.charAt(i)<=122)  
              {
                    after+=before.charAt(i);
              }
          }

        System.out.println("String with special char "+before);
        System.out.println("String without special char "+after);
    }
}
Ravan Scafi
  • 6,382
  • 2
  • 24
  • 32
ishan
  • 1
  • 1
0

The answer above about removing characters > 128 was very helpful. Thank you.

However, it did not cover some situations such as 2 bad characters in a row or a bad character at the end of the string. Here are my modifications that remove all special characters except tab and new line.

  // Remove all special characters except tab and linefeed
  public static String cleanTextBoxData(String value) {
    if (value != null) {
    int beforeLen = value.length();
       for (int i = 0; i < value.length(); i++) {
         if ( ((value.charAt(i)<32) || (value.charAt(i)>126)) &&
            ((value.charAt(i)!=9) && (value.charAt(i)!=10)) ) {
           if ((value.charAt(i)<32) || (value.charAt(i)>126)) {
             if (i==value.length()-1) {
               value = value.substring(0,i);
             } else {
            value = value.substring(0,i) + value.substring(i+1);
            i--;
             }
        }
           if (i == value.length()) {
             break;
           }
         }
       }
       int dif = beforeLen - value.length();
       if (dif > 0) {
         logger.warn("Found and removed {} bad characters from text box.", dif);
       }

    }
      return value;
  }
JBAIRD
  • 1