74

Need a function to strip off a set of illegal character in javascript: |&;$%@"<>()+,

This is a classic problem to be solved with regexes, which means now I have 2 problems.

This is what I've got so far:

var cleanString = dirtyString.replace(/\|&;\$%@"<>\(\)\+,/g, "");

I am escaping the regex special chars with a backslash but I am having a hard time trying to understand what's going on.

If I try with single literals in isolation most of them seem to work, but once I put them together in the same regex depending on the order the replace is broken.

i.e. this won't work --> dirtyString.replace(/\|<>/g, ""):

Help appreciated!

JohnIdol
  • 48,899
  • 61
  • 158
  • 242

4 Answers4

130

What you need are character classes. In that, you've only to worry about the ], \ and - characters (and ^ if you're placing it straight after the beginning of the character class "[" ).

Syntax: [characters] where characters is a list with characters.

Example:

var cleanString = dirtyString.replace(/[|&;$%@"<>()+,]/g, "");
Lekensteyn
  • 64,486
  • 22
  • 159
  • 192
  • 1
    are escape chars not needed within the character class? – JohnIdol Sep 23 '10 at 17:27
  • 7
    No, only for `]` and `\` (and `^` unless you put it anywhere other than in the first position, and `-` unless you put it first or last). – Tim Pietzcker Sep 23 '10 at 17:55
  • https://regex101.com/r/xK3YoV/1 I am trying to remove a backslash infront of a string. The regext expression I am using seems to be valid, please refer the link above. But the backslash is not getting removed. Only the string is getting replaced. Any tips? {{CreatedBy.replace("NT1\/","") | uppercase}} – Phoenix Feb 06 '19 at 05:10
  • 1
    @Phoenix The regular expression is correct, but depending on the language you might have to remove the backslash or escape it another time. By the way, if this is Javascript, you are doing a literal string replacement. For a regex search-and-replace, use `/NT1\//` instead of `"NT1\/"`. – Lekensteyn Feb 06 '19 at 11:02
105

I tend to look at it from the inverse perspective which may be what you intended:

What characters do I want to allow?

This is because there could be lots of characters that make in into a string somehow that blow stuff up that you wouldn't expect.

For example this one only allows for letters and numbers removing groups of invalid characters replacing them with a hypen:

"This¢£«±Ÿ÷could&*()\/<>be!@#$%^bad".replace(/([^a-z0-9]+)/gi, '-');
//Result: "This-could-be-bad"
John Culviner
  • 22,235
  • 6
  • 55
  • 51
  • 1
    Not replacing spaces and other characters may also be useful, e.g.: `/([^\w\s+*:;,.()/\\]+)/gi` (a negated [character class](https://www.regular-expressions.info/charclass.html) doesn't need to escape most characters, like `.` or `/`). – CPHPython Mar 14 '18 at 18:17
7

You need to wrap them all in a character class. The current version means replace this sequence of characters with an empty string. When wrapped in square brackets it means replace any of these characters with an empty string.

var cleanString = dirtyString.replace(/[\|&;\$%@"<>\(\)\+,]/g, "");
ChaosPandion
  • 77,506
  • 18
  • 119
  • 157
6

Put them in brackets []:

var cleanString = dirtyString.replace(/[\|&;\$%@"<>\(\)\+,]/g, "");
Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928