0

I would like to remove any special characters like these: ☺ ☼

I only want characters A-Z, 0-9, and symbols that can be typed using the shift key and a number, such as ! and @.

Here is the code I have now, it only returns true if the string contains characters from another language.

public static boolean hasSymbols(String v) {
    boolean b = false;
    byte bytearray []  = v.getBytes(); 
    CharsetDecoder d = Charset.forName("US-ASCII").newDecoder();
    try {
        CharBuffer r = d.decode(ByteBuffer.wrap(bytearray));  
        r.toString();  
    } catch (Exception e) {
        return true;
    }
    return b;
}
eldarerathis
  • 35,455
  • 10
  • 90
  • 93
Kaleb
  • 1
  • 1
  • 2
  • Are you asking for code to remove the characters, or are you asking if your function is a good way to detect them? – Always Learning Apr 18 '15 at 02:23
  • @galdre nope, that question asks to remove all "non alphanumeric characters" while here the OP asks to remove "special characters" (non-ascii if to judge by the code). – Nir Alfasi Apr 18 '15 at 02:29
  • Yes, but it's so very close -- there's nothing substantially different between the two questions, only superficially. – galdre Apr 18 '15 at 02:30
  • @galdre not true: characters like `?!/,.` should not be removed though they are not alpha-numeric – Nir Alfasi Apr 18 '15 at 02:34
  • 1
    Perhaps you could use some simple regex matching? – Natecat Apr 18 '15 at 02:47
  • @Natecat in the question that I posted as a dup - there are two nice answers. Check it out! – Nir Alfasi Apr 18 '15 at 02:54
  • I have looked into regex, and found nothing. I've done this before but lost the code. Also, @stvcisco I want to detect it and return true if it contains an abnormal character. – Kaleb Apr 18 '15 at 04:39
  • "typed using the shift key and a number, such as ! and @" that would depend on the user's keyboard, presumably over which you have no control or knowledge of. You might mean your keyboard. If so, you'll have to list the characters. – Tom Blodget Apr 18 '15 at 17:57
  • @TomBlodget The only characters I want are ASCII 0-127 and to return true if it contains an ASCII value higher than that. There's a list of all of them at http://www.ascii-code.com/ – Kaleb Apr 19 '15 at 03:54

1 Answers1

0

There are a couple of ways to do this, depending on what exactly you want to do.

If (as the question says) you want to remove all characters that are not "A-Z, 0-9, and symbols that can be typed using the shift key and a number, such as ! and @", the best way is to construct a regular expression pattern that matches the characters you don't want to remove, and use the String.matches(String) and String.replaceAll(String, String) methods:

private static final String NON_NORMAL_CHARACTERS_PATTERN = "\\W|[^!@#\\$%\\^&\\*\\(\\)]";

public static boolean hasSymbols(String string) {
    return string.matches(NON_NORMAL_CHARACTERS_PATTERN);
}

public static String removeSymbols(String string) {
    return string.replaceAll(NON_NORMAL_CHARACTERS_PATTERN, "");
}

The pattern above called NON_NORMAL_CHARACTERS_PATTERN matches non-word characters with \W, and everything except the Shift+[0-9] characters with [^!@#\$%\^&\*\(\)].

If what you want is to remove all characters that are not in the 127 character ASCII set, you can exploit the fact that for these characters, Character.getNumericValue(char) will always be less than or equal to 127:

public static boolean isNonASCII(char character) {
    return Character.getNumericValue(character) > 127;
}

public static boolean hasNonASCII(String string) {
    for (char currentChar : string.toCharArray()) {
        if (isNonASCII(currentChar)) {
            return false;
        }
    }

    return true;
}

public static String removeNonASCII(String string) {
    StringBuilder stringBuilder = new StringBuilder();

    for (char currentChar : string.toCharArray()) {
        if (!isNonASCII(currentChar)) {
            stringBuilder.append(currentChar);
        }
    }

    return stringBuilder.toString();
}
Timothy McCarthy
  • 857
  • 13
  • 19
  • Neither of these worked. For the first method you gave, it returns true for any character if the string only had one character in it. In the second method, it works a little better, but anything other than a letter or number returns -1. The numbers it gives also does not match up with any ASCII table found online. For 1, it returns 1, etc. – Kaleb Apr 19 '15 at 03:49