18

This is my code to determine if a word contains any non-alphanumeric characters:

  String term = "Hello-World";
  boolean found = false;
  Pattern p = Pattern.Compile("\\W*");
  Matcher m = p.Matcher(term);
  if(matcher.find())
    found = true;

I am wondering if the regex expression is wrong. I know "\W" would matches any non-word characters. Any idea on what I am missing ??

Johannes Jander
  • 4,974
  • 2
  • 31
  • 46
remo
  • 3,326
  • 6
  • 32
  • 50

9 Answers9

16

Change your regex to:

.*\\W+.*
Alex
  • 64,178
  • 48
  • 151
  • 180
5

It's 2016 or later and you should think about international strings from other alphabets than just Latin. The frequently cited [^a-zA-Z] will not match in that case. There are better ways in Java now:

[^\\p{IsAlphabetic}^\\p{IsDigit}]

See the reference (section "Classes for Unicode scripts, blocks, categories and binary properties"). There's also this answer that I found helpful.

Johannes Jander
  • 4,974
  • 2
  • 31
  • 46
5

This is the expresion you are looking for:

"^[a-zA-Z0-9]+$"

When it evaluates to false that means does not match so that mean you found what you wanted.

javing
  • 12,307
  • 35
  • 138
  • 211
  • 1
    Don't forget alpha_numeric_... `"^[a-zA-Z0-9]+$"` – vbence Mar 31 '11 at 20:55
  • that does not match numerics! – Simon G. Mar 31 '11 at 20:56
  • Sorry my english i might not rode correctly, but if he wants alfa numeric, the best way would be as vbence said. I just updated the answer. – javing Mar 31 '11 at 20:57
  • Thats a good point. In that case the answer alex gave would be better. But then we would also can have an encoding issue if this were a web app. For example chinese characters. I think a simple regex could not just solve that. Lets imagine it is english :) – javing Mar 31 '11 at 21:08
3

Methods are in the wrong case.

The matcher was declared as m but used as matcher.

The repetition should be "one or many" + instead of "zero or many " * This works correctly:

String term = "Hello-World";
boolean found = false;
Pattern p = Pattern.compile("\\W+");//<-- compile( not Compile(
Matcher m = p.matcher(term);  //<-- matcher( not Matcher
if(m.find()) {  //<-- m not matcher
    found = true;
}

Btw, it would be enough if you just :

boolean found = m.find();

:)

OscarRyz
  • 196,001
  • 113
  • 385
  • 569
2

The problem is the '*'. '*' matches ZERO or more characters. You want to match at least one non word character, so you must use '+' as the quantity modifier. Hence match \W+ (Capital W there for NON word)

eldarerathis
  • 35,455
  • 10
  • 90
  • 93
Simon G.
  • 6,587
  • 25
  • 30
2

Your expression does not take account of possible non-English letters. It's also more complicated than it needs to be. Unless you are using regexs for some reason other than need (such as your professor having told you to) you are much better off with:

boolean found = false;
for (int i=0;i<mystring.length();++i) {
  if (!Character.isLetterOrDigit(mystring.charAt(i))) {
    found=true;
    break;
  }
}
DJClayworth
  • 26,349
  • 9
  • 53
  • 79
  • can that isLetterOrDigit() method recognize chinese, russian, japanese, indian... characters? I dont think it can. – javing Mar 31 '11 at 21:12
  • @joe larson Cool, I didn't know – javing Mar 28 '13 at 20:55
  • 1
    @sfrj - java.lang.Character provides tons of useful Unicode-aware utility methods, such as being able to get the unicode category (Character.getType). That is the premise on which I build my JavaScript unicode character util https://github.com/joelarson4/CharFunk which mines as much of this goodness from Java as possible and makes it available in JavaScript. – jwl Mar 28 '13 at 21:17
0

When I had to do this same thing the regex I use is "(\w)*" Thats what I use. Not sure if capitol w is the same but I also used parenthesis.

startoftext
  • 3,846
  • 7
  • 40
  • 49
  • They are different. `\W` (capital letter) is the inverse of `\w` - it will match any characters that are not matched by the `\w` character class. – eldarerathis Mar 31 '11 at 21:05
0

If you are okay to use Apache StringUtils, then it's as simple as following

StringUtils.isAlphanumeric(inp)
karthikdivi
  • 3,466
  • 5
  • 27
  • 46
-1
if (value.matches(".*[^a-zA-Z0-9].*")) { // tested, seems to work.
    System.out.println("match");
} else {
    System.out.println("no match");
}
Shabbir Dhangot
  • 8,954
  • 10
  • 58
  • 80
gherson
  • 169
  • 2
  • 9