126

I want to write a simple regular expression to check if in given string exist any special character. My regex works but I don't know why it also includes all numbers, so when I put some number it returns an error.

My code:

//pattern to find if there is any special character in string
Pattern regex = Pattern.compile("[$&+,:;=?@#|'<>.-^*()%!]");
//matcher to find if there is any special character in string
Matcher matcher = regex.matcher(searchQuery.getSearchFor());

if(matcher.find())
{
    errors.rejectValue("searchFor", "wrong_pattern.SearchQuery.searchForSpecialCharacters","Special characters are not allowed!");
}
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
Piotr Sagalara
  • 2,247
  • 3
  • 22
  • 25

25 Answers25

264

Please don't do that... little Unicode BABY ANGELs like this one are dying! ◕◡◕ (← these are not images) (nor is the arrow!)

And you are killing 20 years of DOS :-) (the last smiley is called WHITE SMILING FACE... Now it's at 263A... But in ancient times it was ALT-1)

and his friend

BLACK SMILING FACE... Now it's at 263B... But in ancient times it was ALT-2

Try a negative match:

Pattern regex = Pattern.compile("[^A-Za-z0-9]");

(this will ok only A-Z "standard" letters and "standard" 0-9 digits.)

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • 2
    @AbdullahShoaib Clearly not :) You'll need to do a full list of what you consider "special" and/or what you consider "good". – xanatos Mar 09 '15 at 09:58
  • I notice many people use `[A-Za-z0-9]` to represent any letter or digit, both lowercase and upper, but is it not better to just do `[0-z]`? – Abraham Murciano Benzadon Jun 25 '17 at 16:09
  • 5
    @AbrahamMurcianoBenzadon: The decimal digits, the upper case roman letters, and the lower case roman letters occupy three _disjoint_ ranges of character code space. – Solomon Slow Jun 25 '17 at 17:31
  • 1
    @AbrahamMurcianoBenzadon You can see what James wrote in the handy screenshot of Character Map posted by Sina in another response: your regex would accept *:;<=>?@[\]^_`* (other than 0-9, a-z, A-Z) – xanatos Jun 26 '17 at 06:15
  • @AbdullahShoaib to handle non-english letters correctly, it's better to use `[^\p{Alnum}]` – arekolek Aug 07 '17 at 11:09
  • 2
    Lets assume that we use [A-Za-z0-9] , but if we need to take care about Cyrillic or a few more alphabets, than how to do the regex? – Kaloyan Stamatov Oct 18 '18 at 15:14
  • @KaloyanStamatov See https://stackoverflow.com/questions/6256825/cyrillic-alphabet-validation – xanatos Oct 19 '18 at 06:46
  • 4
    there are more languages than English ... – PeiSong Dec 15 '21 at 01:51
49

You have a dash in the middle of the character class, which will mean a character range. Put the dash at the end of the class like so:

[$&+,:;=?@#|'<>.^*()%!-]
Jerry
  • 70,495
  • 13
  • 100
  • 144
33

That's because your pattern contains a .-^ which is all characters between and including . and ^, which included digits and several other characters as shown below:

enter image description here

If by special characters, you mean punctuation and symbols use:

[\p{P}\p{S}]

which contains all unicode punctuation and symbols.

Sina Iravanian
  • 16,011
  • 4
  • 34
  • 45
30

SInce you don't have white-space and underscore in your character class I think following regex will be better for you:

Pattern regex = Pattern.compile("[^\w\s]");

Which means match everything other than [A-Za-z0-9\s_]

Unicode version:

Pattern regex = Pattern.compile("[^\p{L}\d\s_]");
anubhava
  • 761,203
  • 64
  • 569
  • 643
17

For people (like me) looking for an answer for special characters like Ä etc. just use this pattern:

  • Only text (or a space): "[A-Za-zÀ-ȕ ]"

  • Text and numbers: "[A-Za-zÀ-ȕ0-9 ]"

  • Text, numbers and some special chars: "[A-Za-zÀ-ȕ0-9(),-_., ]"

Regex just starts at the ascii index and checks if a character of the string is in within both indexes [startindex-endindex].

So you can add any range.

Eventually you can play around with a handy tool: https://regexr.com/

Good luck;)

13

Use this to catch the common special characters excluding .-_.

/[!"`'#%&,:;<>=@{}~\$\(\)\*\+\/\\\?\[\]\^\|]+/

If you want to include .-_ as well, then use this:

/[-._!"`'#%&,:;<>=@{}~\$\(\)\*\+\/\\\?\[\]\^\|]+/

If you want to filter strings that are URL friendly and do not contain any special characters or spaces, then use this:

/^[^ !"`'#%&,:;<>=@{}~\$\(\)\*\+\/\\\?\[\]\^\|]+$/

When you use patterns like /[^A-Za-z0-9]/, then you will start catching special alphabets like those of other languages and some European accented alphabets (like é, í ).

Ehsan Kazi
  • 388
  • 3
  • 8
11

Shout out to Mohamed Yusuff 's solution!

We can match all 32 special characters using range.

[!-\/:-@[-`{-~]

1st Group

[!-\/]

  • Match ASCII code from 33 to 47:
  • !"#$%&'()*+,-./

-- 15 out of 32 characters matched

2nd Group

[:-@]

  • Match ASCII code from 58 to 64:
  • :;<=>?@

-- 7 out of 32 characters matched

3rd Group

[[-`]

  • Match ASCII code from 91 to 96:
  • [\]^_`

-- 6 out of 32 characters matched

4th Group

[{-~]

  • Match ASCII code from 123 to 126:
  • {|}~

-- 4 out of 32 characters matched

In total matched back all 32 chars (15+7+6+4)

Reference

Special Character table_Arranged

Extended ASCII table

jas99
  • 111
  • 1
  • 3
9

I have defined one pattern to look for any of the ASCII Special Characters ranging between 032 to 126 except the alpha-numeric. You may use something like the one below:

To find any Special Character:

[ -\/:-@\[-\`{-~]

To find minimum of 1 and maximum of any count:

(?=.*[ -\/:-@\[-\`{-~]{1,})

These patterns have Special Characters ranging between 032 to 047, 058 to 064, 091 to 096, and 123 to 126.

8

Here is my regex variant of a special character:

String regExp = "^[^<>{}\"/|;:.,~!?@#$%^=&*\\]\\\\()\\[¿§«»ω⊙¤°℃℉€¥£¢¡®©0-9_+]*$";

(Java code)

VKostenc
  • 1,140
  • 14
  • 19
6

Use this regular expression pattern ("^[a-zA-Z0-9]*$") .It validates alphanumeric string excluding the special characters

sam
  • 113
  • 1
  • 9
5

If you only rely on ASCII characters, you can rely on using the hex ranges on the ASCII table. Here is a regex that will grab all special characters in the range of 33-47, 58-64, 91-96, 123-126

[\x21-\x2F\x3A-\x40\x5B-\x60\x7B-\x7E]

However you can think of special characters as not normal characters. If we take that approach, you can simply do this

^[A-Za-z0-9\s]+

Hower this will not catch _ ^ and probably others.

Serguei Fedorov
  • 7,763
  • 9
  • 63
  • 94
  • I finally used `(?i)^([[a-z][^a-z0-9\\s\\(\\)\\[\\]\\{\\}\\\\^\\$\\|\\?\\*\\+\\.\\<\\>\\-\\=\\!\\_]]*)$` to match any character. – cdaiga Feb 17 '16 at 11:25
  • 2
    **Never use `[A-z]` in a regex.** It matches all uppercase and lowercase ASCII letters as you would expect. but it also matches several punctuation characters whose code points lie between `Z` and `a`. Use `[A-Za-z]` instead, or `[a-z]` in case-insensitive mode. – Alan Moore Feb 17 '16 at 18:37
  • @AlanMoore, good to know! I'll make the change to the answer. – Serguei Fedorov Feb 17 '16 at 18:39
  • how about '.' dot character . It supporsed to match any character except new line. In python re.DOTALL matches all including newline. Check out the regular expression faq in the python tutorial https://docs.python.org/2/howto/regex.html – Dr Deo Sep 06 '18 at 15:01
4

Try:

(?i)^([[a-z][^a-z0-9\\s\\(\\)\\[\\]\\{\\}\\\\^\\$\\|\\?\\*\\+\\.\\<\\>\\-\\=\\!\\_]]*)$

(?i)^(A)$: indicates that the regular expression A is case insensitive.

[a-z]: represents any alphabetic character from a to z.

[^a-z0-9\\s\\(\\)\\[\\]\\{\\}\\\\^\\$\\|\\?\\*\\+\\.\\<\\>\\-\\=\\!\\_]: represents any alphabetic character except a to z, digits, and special characters i.e. accented characters.

[[a-z][^a-z0-9\\s\\(\\)\\[\\]\\{\\}\\\\^\\$\\|\\?\\*\\+\\.\\<\\>\\-\\=\\!\\_]]: represents any alphabetic(accented or unaccented) character only characters.

*: one or more occurrence of the regex that precedes it.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
cdaiga
  • 4,861
  • 3
  • 22
  • 42
  • 2
    Inside a character class, none of those characters need to be escaped except `\` and `-`. Many of them never need to be escaped at all. "Better safe than sorry" is a fine philosophy, but readability is important, too. – Alan Moore Feb 17 '16 at 18:46
  • @AlanMoore (If you're the comic book author, extra credit), the "-" I've found can be left unescaped if left as the trailing character. `[a-z_=-]` matches a-z, _, =, or -. I place readability above all else in anything of the form "regex", but yeah, using the shortcuts can lead to woes eventually. – alife Sep 04 '22 at 19:35
3

Try using this for the same things - StringUtils.isAlphanumeric(value)

Buddy
  • 10,874
  • 5
  • 41
  • 58
Ash
  • 1,210
  • 1
  • 10
  • 14
  • space/blank is also a special char if you use this method. Better to replace the space and tabs chars before calling this method. – Deepu Sahni Mar 21 '18 at 01:14
3

We can achieve this using Pattern and Matcher as follows:

Pattern pattern = Pattern.compile("[^A-Za-z0-9 ]");
Matcher matcher = pattern.matcher(trString);
boolean hasSpecialChars = matcher.find();
KayV
  • 12,987
  • 11
  • 98
  • 148
3

Here is my regular expression, that I used for removing all the special characters from any string :

String regex = ("[ \\\\s@  [\\\"]\\\\[\\\\]\\\\\\\0-9|^{#%'*/<()>}:`;,!& .?_$+-]+")
Jan Černý
  • 1,268
  • 2
  • 17
  • 31
3

Please use this.. it is simplest.

\p{Punct} Punctuation: One of !"#$%&'()*+,-./:;<=>?@[]^_`{|}~

https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

    StringBuilder builder = new StringBuilder(checkstring);
    String regex = "\\p{Punct}"; //Special character : `~!@#$%^&*()-_+=\|}{]["';:/?.,><
    //change your all special characters to "" 
    Pattern  pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(builder.toString());
    checkstring=matcher.replaceAll("");
Nick
  • 138,499
  • 22
  • 57
  • 95
Energy
  • 940
  • 13
  • 20
  • this answer matches the common list of special characters used by (american?) login systems. https://support.okta.com/help/s/article/What-special-characters-are-accepted-by-the-Okta-password?language=en_US – john k Feb 15 '23 at 20:42
2

You can use a negative match:

Pattern regex = Pattern.compile("([a-zA-Z0-9])*"); (For zero or more characters)

or

Pattern regex = Pattern.compile("([a-zA-Z0-9])+"); (For one or more characters)

Tek Nath Acharya
  • 1,676
  • 2
  • 20
  • 35
  • 1
    Question is not about allowing only roman numerals and english alphabets, what if user wanted to except japanese text, your solution is not going to work. – mightyWOZ Jan 08 '20 at 06:19
1

Try this. It works on C# it should work on java also. If you want to exclude spaces just add \s in there @"[^\p{L}\p{Nd}]+"

danvasiloiu
  • 751
  • 7
  • 24
1

To find any number of special characters use the following regex pattern: ([^(A-Za-z0-9 )]{1,})

[^(A-Za-z0-9 )] this means any character except the alphabets, numbers, and space. {1,0} this means one or more characters of the previous block.

Dharman
  • 30,962
  • 25
  • 85
  • 135
mlnbhargav
  • 21
  • 1
  • 1
    It won't find `(` and `)`. – Wiktor Stribiżew Oct 19 '20 at 12:45
  • The `(` and `)` are problematic here. `[^A-Za-z_=]`, for example, allow for anything other than A-Z or a-z or _ or = to trigger. `[^[:alnum:][:punct:]]` similarly triggers on any character not alphanumeric nor punctuation. – alife Sep 04 '22 at 19:31
0

(^\W$)

^ - start of the string, \W - match any non-word character [^a-zA-Z0-9_], $ - end of the string

0

A small addition to include all special characters like: ū and Ā:

An example:

Pattern regex = Pattern.compile("[A-Za-zÀ-ÖØ-öø-ū]");
0

You have to escape some symbols

/([!`\-\_.\"\'#%,:;<>=@{}~\$\(\)\*\+\/\\\?\[\]\^\|]+)/

OR

/([\!\"\#\$\%\&\'\(\)\*\+\,\-\.\/\:\;\<\>\=\?\@\[\]\{\}\\\\\^\_\`\~]+$)/
Dexter
  • 74
  • 8
0

To match the common Ascii special characters you can simply use this [!-\/].

So, it will be Pattern regex = Pattern.compile("[!-\/]");

Mostafa Wael
  • 2,750
  • 1
  • 21
  • 23
0

what about [ -~] This will match all ASCII characters from the space to tilde

Wasim A.
  • 9,660
  • 22
  • 90
  • 120
-1

I use reg below for find special character in string

var reg = new RegExp("[`~!@#$%^&*()\\]\\[+={}/|:;\"\'<>,.?-_]");