0

How can I remove non-ASCII characters (Altcodes) from a string such as: → ← █ ◄ ► ∙

M. Justin
  • 14,487
  • 7
  • 91
  • 130

1 Answers1

0

From your comment, by "AltCode", you're referring to any non-ASCII character.

One solution to this problem would be use the method String.replaceAll(String regex, String replacement). This method replaces all instances of the given regular expression (regex) with a given replacement string.

Replaces each substring of this string that matches the given regular expression with the given replacement.

Java has the "\p{ASCII}" pattern which match only ASCII characters. This can be negated using "[^…]" syntax to match any non-ASCII characters instead. The matched characters can then be replaced with the empty string, effectively removing them from the resulting string.

String s = "A→←B█◄C►";
String stripped = s.replaceAll("[^\\p{ASCII}]", "");
System.out.println(stripped); // Prints "ABC"

The full list of valid regex pattern characters is documented in the Pattern class.

Note: If you are going to be calling this pattern multiple times within a run, it will be more efficient to use a compiled Pattern directly, rather than String.replaceAll. This way the pattern is compiled only once and reused, rather than each time replaceAll is called:

public class AsciiStripper {
    private static final Pattern NON_ASCII_PATTERN = Pattern.compile("[^\\p{ASCII}]");
    
    public String stripAscii(String s) {
        return NON_ASCII_PATTERN.matcher(s).replaceAll("");
    }
}
M. Justin
  • 14,487
  • 7
  • 91
  • 130