3

I am trying to find all characters, which are not letters(upper/lowercase), numbers, and underscore, and remove it.

stringA.replaceAll("[^a-zA-Z0-9_]","")   // works perfectly fine

However, the following code could not even compile in Java:

stringA.replaceAll("\W","");
// or also
stringA.replaceAll("[\W]","");
// or also
stringA.replaceAll("[\\W]","");

If I use only "\\W" rather than "\W", the above code turns out to be correct.
So, what is the differences between \W, \\W, and when to use brackets like [^a-zA-Z0-9_]

EuberDeveloper
  • 874
  • 1
  • 14
  • 38
Yao
  • 709
  • 2
  • 11
  • 22
  • escape the backslash one more time. And don't forget to add semicolon at the last. `stringA.replaceAll("\\W","");` – Avinash Raj May 22 '15 at 12:54
  • Escape the escaper! \ is not only a Regex escape char, it's a Java escape char as well! – Davio May 22 '15 at 12:55

1 Answers1

3

However, the following code could not even compile in Java

Java has no idea that the string is going to regex engine. Anything in doublequotes is a string literal to Java compiler, so it tries to interpret \W as a Java escape sequence, which does not exist. This trigger a compile-time error.

If I use only \\W rather than \W, the above code turns out to be correct.

This is because \\ is a valid escape sequence, which means "a single slash". When you put two slashes inside a string literal, Java compiler removes one slash, so regex engine sees \W, not \\W

So, what is the differences between \W, \\W, and when to use brackets like [^a-zA-Z0-9_]

The third one is a longer version of the second one; the first one does not compile.

Community
  • 1
  • 1
Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523