233

I tried using this but didn't work-

return value.replaceAll("/[^A-Za-z0-9 ]/", "");
Alex Gomes
  • 2,385
  • 2
  • 15
  • 6

14 Answers14

295

Use [^A-Za-z0-9].

Note: removed the space since that is not typically considered alphanumeric.

Dave Jarvis
  • 30,436
  • 41
  • 178
  • 315
Mirek Pluta
  • 7,883
  • 1
  • 32
  • 23
  • 10
    Neither should the space at the end of the character class. – Andrew Duffy Nov 26 '09 at 20:31
  • the reg exp is ok, just remove "/" from the regexp string from value.replaceAll("/[^A-Za-z0-9 ]/", ""); to value.replaceAll("[^A-Za-z0-9 ]", ""); you don't need the "/" inside the regexp, I think you've confused with javascript patterns – erik.aortiz May 01 '20 at 21:06
  • 1
    note that this onl works with Latin alphabet and doesn't works with accent characters or any "special" char set. – SüniÚr Jul 31 '20 at 06:42
147

Try

return value.replaceAll("[^A-Za-z0-9]", "");

or

return value.replaceAll("[\\W]|_", "");
Andrew Duffy
  • 6,800
  • 2
  • 23
  • 17
88

You should be aware that [^a-zA-Z] will replace characters not being itself in the character range A-Z/a-z. That means special characters like é, ß etc. or cyrillic characters and such will be removed.

If the replacement of these characters is not wanted use pre-defined character classes instead:

 str.replaceAll("[^\\p{IsAlphabetic}\\p{IsDigit}]", "");

PS: \p{Alnum} does not achieve this effect, it acts the same as [A-Za-z0-9].

hendrix
  • 3,364
  • 8
  • 31
  • 46
Andre Steingress
  • 4,381
  • 28
  • 28
  • 13
    Thanks a lot for this post - it was very useful to me. Additionally, I believe this is the actual answer to the question. The Latin alphabet isn't the only one in the world! – Mateva Oct 15 '15 at 07:15
  • 2
    Actually, the stated regex will treat "^" as a valid character, since only the first occurrence of "^" is negating the meaning of the selection. `[^\\p{IsAlphabetic}\\p{IsDigit}]` works well. – Bogdan Klichuk Jan 19 '18 at 17:22
  • 1
    @JakubTurcovsky https://docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html defines IsAlphabetic and IsDigit as binary properties. Alpha and Digit are POSIX character classes (US-ASCII only). Except the https://docs.oracle.com/javase/10/docs/api/java/util/regex/Pattern.html#UNICODE_CHARACTER_CLASS flag is specified. – Andre Steingress Apr 17 '18 at 14:39
  • @AndreSteingress Correct, the reason `{IsDigit}` doesn't work for me and `{Digit}` does is that I'm trying this on Android. And Android has `UNICODE_CHARACTER_CLASS` turned on by default. Thanks for clearance. – Jakub Turcovsky Apr 30 '18 at 11:28
  • How to only allow Alpha, Digit, and Emoji? – Robert Goodrick Aug 14 '18 at 17:29
64
return value.replaceAll("[^A-Za-z0-9 ]", "");

This will leave spaces intact. I assume that's what you want. Otherwise, remove the space from the regex.

erickson
  • 265,237
  • 58
  • 395
  • 493
23

You could also try this simpler regex:

 str = str.replaceAll("\\P{Alnum}", "");
nhinkle
  • 1,157
  • 1
  • 17
  • 32
saurav
  • 3,424
  • 1
  • 22
  • 33
12

Java's regular expressions don't require you to put a forward-slash (/) or any other delimiter around the regex, as opposed to other languages like Perl, for example.

abyx
  • 69,862
  • 18
  • 95
  • 117
12

Solution:

value.replaceAll("[^A-Za-z0-9]", "")

Explanation:

[^abc] When a caret ^ appears as the first character inside square brackets, it negates the pattern. This pattern matches any character except a or b or c.

Looking at the keyword as two function:

  • [(Pattern)] = match(Pattern)
  • [^(Pattern)] = notMatch(Pattern)

Moreover regarding a pattern:

  • A-Z = all characters included from A to Z

  • a-z = all characters included from a to z

  • 0=9 = all characters included from 0 to 9

Therefore it will substitute all the char NOT included in the pattern

Community
  • 1
  • 1
GalloCedrone
  • 4,869
  • 3
  • 25
  • 41
8

I made this method for creating filenames:

public static String safeChar(String input)
{
    char[] allowed = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-_".toCharArray();
    char[] charArray = input.toString().toCharArray();
    StringBuilder result = new StringBuilder();
    for (char c : charArray)
    {
        for (char a : allowed)
        {
            if(c==a) result.append(a);
        }
    }
    return result.toString();
}
zneo
  • 588
  • 3
  • 10
3

If you want to also allow alphanumeric characters which don't belong to the ascii characters set, like for instance german umlaut's, you can consider using the following solution:

 String value = "your value";

 // this could be placed as a static final constant, so the compiling is only done once
 Pattern pattern = Pattern.compile("[^\\w]", Pattern.UNICODE_CHARACTER_CLASS);

 value = pattern.matcher(value).replaceAll("");

Please note that the usage of the UNICODE_CHARACTER_CLASS flag could have an impose on performance penalty (see javadoc of this flag)

snap
  • 1,598
  • 1
  • 14
  • 21
1

Simple method:

public boolean isBlank(String value) {
    return (value == null || value.equals("") || value.equals("null") || value.trim().equals(""));
}

public String normalizeOnlyLettersNumbers(String str) {
    if (!isBlank(str)) {
        return str.replaceAll("[^\\p{L}\\p{Nd}]+", "");
    } else {
        return "";
    }
}
Alberto Cerqueira
  • 1,339
  • 14
  • 18
1
public static void main(String[] args) {
    String value = " Chlamydia_spp. IgG, IgM & IgA Abs (8006) ";

    System.out.println(value.replaceAll("[^A-Za-z0-9]", ""));

}

output: ChlamydiasppIgGIgMIgAAbs8006

Github: https://github.com/AlbinViju/Learning/blob/master/StripNonAlphaNumericFromString.java

Jason Roman
  • 8,146
  • 10
  • 35
  • 40
Albin
  • 11
  • 4
1

Using Guava you can easily combine different type of criteria. For your specific solution you can use:

value = CharMatcher.inRange('0', '9')
        .or(CharMatcher.inRange('a', 'z')
        .or(CharMatcher.inRange('A', 'Z'))).retainFrom(value)
Deb
  • 2,922
  • 1
  • 16
  • 32
0

Guava's CharMatcher provides a concise solution:

output = CharMatcher.javaLetterOrDigit().retainFrom(input);
Bunarro
  • 1,550
  • 1
  • 13
  • 8
0

Dart

If you tried this and it didn't work..

value.replaceAll("[^A-Za-z0-9]", "");

Just use RegExp like this:

value.replaceAll(RegExp("[^A-Za-z0-9]"), "");