2

I have a list of special characters that I want removed from strings, the list is as follows: (separated by space)

! % & \ ' ( ) * + - . / ; < = > ? \\ , : # @ \t \r \n " [] _

I was trying to to my replace function like this, but I ran into trouble with characters like //, \t, \r, \n and []

var input = 'test ! % & \ ' ( ) * + - . / ; < = > ? \\ , : # @ \t \r \n " [] _ test';
input.replace(/[!%&\'()*+-./;<=>?\\,:#@\t\r\n"[]_][\u007B-\u00BF]/g, "");

Is there a better way to correctly do this? Or is it possible to use an array as restricted characters somehow?

liquid5109
  • 105
  • 11
  • 2
    ‘*ran into trouble*’ And what trouble would that be, exactly? Could it be the unescaped `]`? Or the `+-.` bit? – Biffen Aug 20 '15 at 10:54
  • Is `[]` really one character, or is there a space missing in your list? – Sebastian Simon Aug 20 '15 at 10:56
  • It may be a better approach to search for what _can_ remain in the string and remove anything else. E.g. if you want to leave letters spaces and numbers, you would search for anything not a letter, number or space: `[^\w\d ]`. – marekful Aug 20 '15 at 10:58
  • @Biffen, Yes, I assume it was the unescaped characters, wasn't sure what to look for. – liquid5109 Aug 20 '15 at 11:02
  • @Xufox, That was a missing space on my end – liquid5109 Aug 20 '15 at 11:03
  • @marekful, That would make sense, but I was only given a small list of characters that should be removed, so in this case it made more sense to do it like that. – liquid5109 Aug 20 '15 at 11:03

3 Answers3

1

Here is the regex you can use:

var input = 'test ! % & \\ \' ( ) * + - . / ; < = > ? \\ , : # @ \t \r \n " [] _ test';
alert(input.replace(/[!%&'()*+./;<=>?\\,/:#@\t\r\n"\[\]_\u007B-\u00BF-]/g, ""));

Note you need to escape [ and ] in a JS regex character class. And the hyphen should be either placed at the end or escaped. Also, I guess you want to also remove characters in the \u007B-\u00BF range, so I "merged" the 2 character classes into 1.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks, escaping those characters seemed to have worked. I kept the range characters in a different class, just for readability. (As there are more ranges) – liquid5109 Aug 20 '15 at 11:04
  • @liquid5109 Keeping them in a different class makes the regex match quite different things altogether! I.e. one character from the first class *followed by* one from the second class. – Biffen Aug 20 '15 at 11:07
  • Sure, but in this case it is clear the code points denote "special" characters that most likely should be removed from the input string (judging by the question requirement). – Wiktor Stribiżew Aug 20 '15 at 11:26
  • 1
    @stribizhev We're agreeing, right? It's just @liquid5109 that needs to understand the difference between `[0-9a-z]` and `[0-9][a-z]`. – Biffen Aug 20 '15 at 11:38
1

Certain characters need to be escaped, with a backslash.

These characters are .^$*+?()[{\|-

You also need to escape the / as you are using a js literal.

In addition, having [\u007B-\u00BF] at the end means it will only match characters that are followed by one of these characters. It is not clear from your question if that is actually what you want.

So your regex should be:

input.replace(/[!%&'\(\)\*\+\-\.\/;<=>\?\\,:#@\t\r\n"\[\]_][\u007B-\u00BF]/g, "");

Here is an example of the first part

Community
  • 1
  • 1
musefan
  • 47,875
  • 21
  • 135
  • 185
-2
var re = /[a-z()!%$\\@<>=?\/.-;,#\[\]_\*\&\+\-\"]/g; 
var str = 'test ! % & \ \' ( ) * + - . / ; < = > ? \\ , : # @ \t \r \n " [] _ test';
var m;

while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
    // View your result using the m-variable.
    // eg m[0] etc.
}
Colta Victor
  • 70
  • 1
  • 1
  • 8