Java regex: check if word has non alphanumeric characters

Question

This is my code to determine if a word contains any non-alphanumeric characters:

  String term = "Hello-World";
  boolean found = false;
  Pattern p = Pattern.Compile("\\W*");
  Matcher m = p.Matcher(term);
  if(matcher.find())
    found = true;

I am wondering if the regex expression is wrong. I know "\W" would matches any non-word characters. Any idea on what I am missing ??

score 16 · Accepted Answer · answered Mar 31 '11 at 20:53

16

Change your regex to:

.*\\W+.*

answered Mar 31 '11 at 20:53

Alex

64,178
48
151
180

score 5 · Answer 2 · answered Mar 02 '16 at 11:00

It's 2016 or later and you should think about international strings from other alphabets than just Latin. The frequently cited [^a-zA-Z] will not match in that case. There are better ways in Java now:

[^\\p{IsAlphabetic}^\\p{IsDigit}]

See the reference (section "Classes for Unicode scripts, blocks, categories and binary properties"). There's also this answer that I found helpful.

javing · Answer 3 · 2011-03-31T20:59:53.540

5

This is the expresion you are looking for:

"^[a-zA-Z0-9]+$"

When it evaluates to false that means does not match so that mean you found what you wanted.

edited Mar 31 '11 at 20:59

answered Mar 31 '11 at 20:53

javing

12,307
35
138
211

1

Don't forget alpha_numeric_... `"^[a-zA-Z0-9]+$"` – vbence Mar 31 '11 at 20:55
that does not match numerics! – Simon G. Mar 31 '11 at 20:56
Sorry my english i might not rode correctly, but if he wants alfa numeric, the best way would be as vbence said. I just updated the answer. – javing Mar 31 '11 at 20:57
Thats a good point. In that case the answer alex gave would be better. But then we would also can have an encoding issue if this were a web app. For example chinese characters. I think a simple regex could not just solve that. Lets imagine it is english :) – javing Mar 31 '11 at 21:08

score 3 · Answer 4 · answered Mar 31 '11 at 21:01

Methods are in the wrong case.

The matcher was declared as m but used as matcher.

The repetition should be "one or many" + instead of "zero or many " * This works correctly:

String term = "Hello-World";
boolean found = false;
Pattern p = Pattern.compile("\\W+");//<-- compile( not Compile(
Matcher m = p.matcher(term);  //<-- matcher( not Matcher
if(m.find()) {  //<-- m not matcher
    found = true;
}

Btw, it would be enough if you just :

boolean found = m.find();

:)

score 2 · Answer 5 · edited Mar 31 '11 at 21:03

2

The problem is the '*'. '*' matches ZERO or more characters. You want to match at least one non word character, so you must use '+' as the quantity modifier. Hence match \W+ (Capital W there for NON word)

edited Mar 31 '11 at 21:03

eldarerathis

35,455
10
90
93

answered Mar 31 '11 at 20:57

Simon G.

6,587
25
30

score 2 · Answer 6 · answered Mar 31 '11 at 21:04

2

Your expression does not take account of possible non-English letters. It's also more complicated than it needs to be. Unless you are using regexs for some reason other than need (such as your professor having told you to) you are much better off with:

boolean found = false;
for (int i=0;i<mystring.length();++i) {
  if (!Character.isLetterOrDigit(mystring.charAt(i))) {
    found=true;
    break;
  }
}

answered Mar 31 '11 at 21:04

DJClayworth

26,349
9
53
79

can that isLetterOrDigit() method recognize chinese, russian, japanese, indian... characters? I dont think it can. – javing Mar 31 '11 at 21:12
@joe larson Cool, I didn't know – javing Mar 28 '13 at 20:55
1

@sfrj - java.lang.Character provides tons of useful Unicode-aware utility methods, such as being able to get the unicode category (Character.getType). That is the premise on which I build my JavaScript unicode character util https://github.com/joelarson4/CharFunk which mines as much of this goodness from Java as possible and makes it available in JavaScript. – jwl Mar 28 '13 at 21:17

score 0 · Answer 7 · answered Mar 31 '11 at 20:54

0

When I had to do this same thing the regex I use is "(\w)*" Thats what I use. Not sure if capitol w is the same but I also used parenthesis.

answered Mar 31 '11 at 20:54

startoftext

3,846
7
40
49

They are different. `\W` (capital letter) is the inverse of `\w` - it will match any characters that are not matched by the `\w` character class. – eldarerathis Mar 31 '11 at 21:05

score 0 · Answer 8 · answered Apr 30 '20 at 17:44

0

If you are okay to use Apache StringUtils, then it's as simple as following

StringUtils.isAlphanumeric(inp)

answered Apr 30 '20 at 17:44

karthikdivi

3,466
5
27
46

score -1 · Answer 9 · edited May 12 '16 at 11:42

-1

if (value.matches(".*[^a-zA-Z0-9].*")) { // tested, seems to work.
    System.out.println("match");
} else {
    System.out.println("no match");
}

edited May 12 '16 at 11:42

Shabbir Dhangot

8,954
10
58
80

answered May 12 '16 at 11:10

gherson

169
2
9

Java regex: check if word has non alphanumeric characters

9 Answers9

Linked