Remove all special characters with RegExp

Question

I would like a RegExp that will remove all special characters from a string. I am trying something like this but it doesn’t work in IE7, though it works in Firefox.

var specialChars = "!@#$^&%*()+=-[]\/{}|:<>?,.";

for (var i = 0; i < specialChars.length; i++) {
  stringToReplace = stringToReplace.replace(new RegExp("\\" + specialChars[i], "gi"), "");
}

A detailed description of the RegExp would be helpful as well.

Something like this would be better off as a white-list, not a black-list. then you could just do [a-z]|[0-9]|\s — Ape-inago, Dec 07 '10 at 08:49
Any script error? Did you debug? Or else put a try...catch block in the javascript code. — Kangkan, Dec 07 '10 at 08:49
@ Ape-inago can you please explain RegExp a bit more to me please — Timothy Ruhle, Dec 07 '10 at 08:50
Please define "special character"! Is "風" special for you? (Thinking about this you'll see @Ape-iango's point.) — deceze, Dec 07 '10 at 08:53
What about "！＠＃＄＾＆％＊（）＋＝ー"? (No, these are not the same as above.) :-P — deceze, Dec 07 '10 at 08:57
@deceze i do realise that there are like 300 ascii characters, these characters were for the example. I didn't know about RegExp and that i could do a white list. — Timothy Ruhle, Dec 07 '10 at 09:05
@Timothy Better try *109,000+ characters* supported by Unicode, which is what Javascript uses internally. Just a general, well-intentioned advise: Whenever you think "special characters", be a little more precise. :-) — deceze, Dec 07 '10 at 09:09
I don't think anyone here meant any offence. I've got burned before by doing it as a blacklist since there always are those little "gotcha's" that end up getting through (like deceze's examples). Ultimately the correct approach is more about why you are trying to do this. — Ape-inago, Dec 07 '10 at 20:56

annakata · Accepted Answer · 2010-12-07T09:00:47.457

755

var desired = stringToReplace.replace(/[^\w\s]/gi, '')

As was mentioned in the comments it's easier to do this as a whitelist - replace the characters which aren't in your safelist.

The caret (^) character is the negation of the set [...], gi say global and case-insensitive (the latter is a bit redundant but I wanted to mention it) and the safelist in this example is digits, word characters, underscores (\w) and whitespace (\s).

edited Dec 07 '10 at 09:00

answered Dec 07 '10 at 08:55

annakata

74,572
17
113
180

80

This solution do not work for non English symbols. "Їжак" for example. – Seagull Oct 21 '14 at 08:52
8

You can also use uppercase \W instead of ^\w. \W : Matches any non-word character. Equivalent to [^A-Za-z0-9_]. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions – delkant Jun 24 '16 at 22:14
@Seagull I have added an answer which handles Unicodes. – freedev Nov 27 '16 at 17:54
5

to accept accents words, like in portuguese language, do this: stringToReplace.replace(/[^A-zÀ-ú\s]/gi, '') – alansiqueira27 Apr 10 '17 at 16:46
this replaces also chinese characters, how to exclude them from this replace? – Ryan Apr 24 '17 at 09:54
3

To add most European languages (Norwegian, Sweedish, German, Portoguise, Spanish) stringToReplace.replace(/[^\w\s\xc0-xff]/gi, ''). To include other languages unicode ranges can be used. See: https://stackoverflow.com/questions/150033/regular-expression-to-match-non-english-characters – Eskil Mjelva Saatvedt Apr 12 '19 at 12:18
3

best for me considering I don't want any accents / specials. I don't even want space, I removed `\s` – tatsu Sep 25 '19 at 11:59
var sessionName = '\ / ? * [ ]' sessionName.replace(/[^\w\s]/gi, '-'); While I'm trying to use your script it should return 6 - But it return only 5. Actually Its skips *. Why? – Jeba May 08 '20 at 07:44
This looks like skipping _ – Fayaz Jun 08 '21 at 13:35

score 168 · Answer 2 · answered Jun 18 '12 at 20:10

168

Note that if you still want to exclude a set, including things like slashes and special characters you can do the following:

var outString = sourceString.replace(/[`~!@#$%^&*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');

take special note that in order to also include the "minus" character, you need to escape it with a backslash like the latter group. if you don't it will also select 0-9 which is probably undesired.

answered Jun 18 '12 at 20:10

noinput

1,849
1
12
9

20

excellent solution! the accepted answer only works in English, this works on any languages (as far as I checked). thanks :) – Ronen Ness Nov 27 '14 at 12:55
1

@knutole remove the `?` from the character set portion towards the front. this lists the characters you want to remove, so excluding it from being stripped will inherently include it in the final result. – noinput Mar 02 '16 at 16:33
This works great, fits perfectly for any language, just need to add the char that you want replace and that's all. Thanks. – Kevin Ramirez Zavalza Jun 23 '18 at 02:56
How would I implement this on a search input? How do I test the input against this RegEx? – PhilosophOtter May 09 '21 at 14:13
1

By the way, there is no need to escape `{` and `}`. Like: `var outString = sourceString.replace(/[\`~!@#$%^&*()_|+\-=?;:'",.<>{}\[\]\\\/]/gi, '');` – Aldis Mar 05 '22 at 13:11

score 31 · Answer 3 · edited Jun 20 '20 at 09:12

31

Plain Javascript regex does not handle Unicode letters.

Do not use [^\w\s], this will remove letters with accents (like àèéìòù), not to mention to Cyrillic or Chinese, letters coming from such languages will be completed removed.

You really don't want remove these letters together with all the special characters. You have two chances:

Add in your regex all the special characters you don't want remove,
for example: [^èéòàùì\w\s].
Have a look at xregexp.com. XRegExp adds base support for Unicode matching via the \p{...} syntax.

var str = "Їжак::: résd,$%& adùf"
var search = XRegExp('([^?<first>\\pL ]+)');
var res = XRegExp.replace(str, search, '',"all");

console.log(res); // returns "Їжак::: resd,adf"
console.log(str.replace(/[^\w\s]/gi, '') ); // returns " rsd adf"
console.log(str.replace(/[^\wèéòàùì\s]/gi, '') ); // returns " résd adùf"

<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.js"></script>

edited Jun 20 '20 at 09:12

Community

1
1

answered Nov 27 '16 at 17:25

freedev

25,946
8
108
125

3

Good to know for internationalization, i had no idea JS regex wasn't UTF-8 minded. – LessQuesar Nov 08 '17 at 07:52
1

You can't put all valid UTF-8 letters into var str – Seagull Jan 12 '18 at 13:51
@Seagull yes, but in case you're not write world wide compatible application, you can pragmatically put only the list of valid UTF-8 letters for your current localizations. In my case, for Italian language there are only few letters. – freedev Jan 12 '18 at 13:56

score 15 · Answer 4 · answered May 18 '21 at 11:53

15

using \W or [a-z0-9] regex won't work for non english languages like chinese etc.,

It's better to use all special characters in regex and exclude them from given string

str.replace(/[~`!@#$%^&*()+={}\[\];:\'\"<>.,\/\\\?-_]/g, '');

answered May 18 '21 at 11:53

Manikanta C.S.E

401
6
7

Seagull · Answer 5 · 2019-04-26T17:08:03.053

12

The first solution does not work for any UTF-8 alphabet. (It will cut text such as Їжак). I have managed to create a function which does not use RegExp and use good UTF-8 support in the JavaScript engine. The idea is simple if a symbol is equal in uppercase and lowercase it is a special character. The only exception is made for whitespace.

function removeSpecials(str) {
    var lower = str.toLowerCase();
    var upper = str.toUpperCase();

    var res = "";
    for(var i=0; i<lower.length; ++i) {
        if(lower[i] != upper[i] || lower[i].trim() === '')
            res += str[i];
    }
    return res;
}

Update: Please note, that this solution works only for languages where there are small and capital letters. In languages like Chinese, this won't work.

Update 2: I came to the original solution when I was working on a fuzzy search. If you also trying to remove special characters to implement search functionality, there is a better approach. Use any transliteration library which will produce you string only from Latin characters and then the simple Regexp will do all magic of removing special characters. (This will work for Chinese also and you also will receive side benefits by making Tromsø == Tromso).

edited Apr 26 '19 at 17:08

answered Oct 21 '14 at 08:50

Seagull

3,319
2
31
37

Excellent, like this answer! I use it for creating a valid filename and have it extended your solution to remove spaces (Linux/Unix compatible) and allow numbers as well. So I extended the if statement (jQuery involved): if(str[i] !== ' ' && (lower[i] != upper[i] || lower[i].trim() === '' || $.isNumeric(str[i]))) – Jonny Dec 06 '17 at 11:39
in many languages there are no uppercase letters... therefore the function will consider valid input as special characters – Yair Levy May 01 '18 at 19:04
Chinese characters are one example that get stripped out by this – lethek Jul 04 '18 at 00:19
When I created this solution, unfortunately, I was not thinking about languages like Chinese. The solution has to be proposed, as the previous answers won't work either. – Seagull Jul 19 '18 at 19:55

millebii · Answer 6 · 2010-12-07T09:02:58.657

2

I use RegexBuddy for debbuging my regexes it has almost all languages very usefull. Than copy/paste for the targeted language. Terrific tool and not very expensive.

So I copy/pasted your regex and your issue is that [,] are special characters in regex, so you need to escape them. So the regex should be : /!@#$^&%*()+=-[\x5B\x5D]\/{}|:<>?,./im

edited Dec 07 '10 at 09:02

answered Dec 07 '10 at 08:54

millebii

1,277
2
17
27

score 2 · Answer 7 · answered Jun 22 '17 at 21:16

2

str.replace(/\s|[0-9_]|\W|[#$%^&*()]/g, "") I did sth like this. But there is some people who did it much easier like str.replace(/\W_/g,"");

answered Jun 22 '17 at 21:16

Eldar Mammadov

49
1
9

Most of the things in your approach are redundant, since `\W` contains some of the characters. But why would you filter out numbers? Those aren’t special characters. – Sebastian Simon Mar 02 '18 at 13:08

score 1 · Answer 8 · answered Apr 14 '22 at 10:17

@Seagull anwser (https://stackoverflow.com/a/26482552/4556619) looks good but you get undefined string in result when there are some special (turkish) characters. See example below.

let str="bənövşəyi пурпурный İdÖĞ";

i slightly improve it and patch with undefined check.

function removeSpecials(str) {
    let lower = str.toLowerCase();
    let upper = str.toUpperCase();

    let res = "",i=0,n=lower.length,t;
    for(i; i<n; ++i) {
        if(lower[i] !== upper[i] || lower[i].trim() === ''){
            t=str[i];
            if(t!==undefined){
                res +=t;
            }
        }
    }
    return res;
}

wow, that's a pretty amazing idea. I'd implement it differently. However it doesn't support languages that has no uppercase letters, such as Hebrew Arabic Chinese etc — oriadam, Apr 03 '23 at 13:39

score 1 · Answer 9 · answered Sep 14 '22 at 05:58

1

text.replace(/[`~!@#$%^*()_|+\-=?;:'",.<>\{\}\[\]\\\/]/gi, '');

answered Sep 14 '22 at 05:58

tyne

97
5

score -1 · Answer 10 · answered Dec 07 '10 at 08:57

-1

why dont you do something like:

re = /^[a-z0-9 ]$/i;
var isValid = re.test(yourInput);

to check if your input contain any special char

answered Dec 07 '10 at 08:57

AnD

3,060
8
35
63

21

The OP says he's trying to remove special characters not see if they exist. – annakata Dec 07 '10 at 09:01
This is one of good solution but this will only allow English alphabet letter numbers and the space but it will remove characters like `èéòàùì` and some cases this will not be the solution – mapmalith Sep 04 '19 at 05:10

Remove all special characters with RegExp

10 Answers10

Plain Javascript regex does not handle Unicode letters.

Linked

Related