Remove not alphanumeric characters from string

Question

I want to convert the following string to the provided output.

Input:  "\\test\red\bob\fred\new"
Output: "testredbobfrednew"

I've not found any solution that will handle special characters like \r, \n, \b, etc.

Basically I just want to get rid of anything that is not alphanumeric. Here is what I've tried...

Attempt 1: "\\test\red\bob\fred\new".replace(/[_\W]+/g, "");
Output 1:  "testedobredew"

Attempt 2: "\\test\red\bob\fred\new".replace(/['`~!@#$%^&*()_|+-=?;:'",.<>\{\}\[\]\\\/]/gi, "");
Output 2:  "testedobred [newline] ew"

Attempt 3: "\\test\red\bob\fred\new".replace(/[^a-zA-Z0-9]/, "");
Output 3:  "testedobred [newline] ew"

Attempt 4: "\\test\red\bob\fred\new".replace(/[^a-z0-9\s]/gi, '');
Output 4:  "testedobred [newline] ew"

One other attempt with multiple steps

function cleanID(id) {
    id = id.toUpperCase();
    id = id.replace( /\t/ , "T");
    id = id.replace( /\n/ , "N");
    id = id.replace( /\r/ , "R");
    id = id.replace( /\b/ , "B");
    id = id.replace( /\f/ , "F");
    return id.replace( /[^a-zA-Z0-9]/ , "");
}

with results

Attempt 1: cleanID("\\test\red\bob\fred\new");
Output 1: "BTESTREDOBFREDNEW"

Working Solution:

Final Attempt 1: return JSON.stringify("\\test\red\bob\fred\new").replace( /\W/g , '');
Output 1: "testredbobfrednew"

Interesting question, the \n in \new is clearly what's tripping this up. I'm not entirely sure how to find and replace that though *goes searching for regex on whitespate special chars* — Will Buck, Feb 20 '12 at 16:19
Are the inputs escaped/how are they assigned? `var Input = "\\test\red\bob\fred\new"` this string does not contain "red" so your 1st attempt is correct, are you testing against the litteral `"\\\\test\\red\\bob\\fred\\new"`? — Alex K., Feb 20 '12 at 16:21
I guess the question is, do backslashes in your input string represent special characters? (Based on your example output, I'm guessing no.) — Dave, Feb 20 '12 at 16:23

AD7six · Accepted Answer · 2016-04-01T09:40:07.850

630

Removing non-alphanumeric chars

The following is the/a correct regex to strip non-alphanumeric chars from an input string:

input.replace(/\W/g, '')

Note that \W is the equivalent of [^0-9a-zA-Z_] - it includes the underscore character. To also remove underscores use e.g.:

input.replace(/[^0-9a-z]/gi, '')

The input is malformed

Since the test string contains various escaped chars, which are not alphanumeric, it will remove them.

A backslash in the string needs escaping if it's to be taken literally:

"\\test\\red\\bob\\fred\\new".replace(/\W/g, '')
"testredbobfrednew" // output

Handling malformed strings

If you're not able to escape the input string correctly (why not?), or it's coming from some kind of untrusted/misconfigured source - you can do something like this:

JSON.stringify("\\test\red\bob\fred\new").replace(/\W/g, '')
"testredbobfrednew" // output

Note that the json representation of a string includes the quotes:

JSON.stringify("\\test\red\bob\fred\new")
""\\test\red\bob\fred\new""

But they are also removed by the replacement regex.

edited Apr 01 '16 at 09:40

answered Feb 20 '12 at 16:23

AD7six

63,116
12
91
123

14

This doesn't remove underscores. – kylex Feb 03 '13 at 04:32
6

@kylex, that's because underscores are considered to be a part of the alphanumeric bunch, for some reason – Eugene Kuzmenko Mar 01 '13 at 12:59
18

["Because they are the characters typically legal in variable identifiers."](http://www.perlmonks.org/bare/?node_id=347189). There's no "_" in the question, of course replacing `\W` with `[_\W]` (which is used in the question) or similar would remove underscores. – AD7six Mar 01 '13 at 18:14
1

@AD7six, could you please elaborate as to why one should be using JSON.stringify() when the string is coming from an untrusted source? Is there any security concern not to do so? Thanks! – jbmusso Jul 23 '13 at 17:19
1

@guithor It's not that "one should", or that it affects security at all; If "some string" is being received and for whatever reason it's basically borked( Not apparent from the question why the string is received malformed) - it allows seeing the string for what it is: http://jsfiddle.net/Z6N7C – AD7six Jul 23 '13 at 18:03
1

@kylex If you want to control which chars are retained, you can simply do `'\\test\\red\\bob\\fred\\new_ - stuff'.match(/[a-zA-Z0-9 -]/g).join('')` (allows alphanumeric, space and hyphen, but not underscore) – Mikael Lirbank Sep 17 '15 at 22:57
what about avoiding the space? – Shift 'n Tab Jan 17 '18 at 02:30
This does not support international characters. – Pål Thingbø May 19 '20 at 08:28
[^0-9a-z] removes uppercase alphabets as well – chetan Feb 02 '21 at 07:59
1

@chetan the [ignore case flag](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/replace#defining_the_regular_expression_in_replace) used in the answer means it would not remove upper case characters, it could instead be written as `input.replace(/[^0-9a-zA-Z]/g, '')` - but that's not functionally different than the answer. Feel free to edit the answer to address any confusion you encountered. – AD7six Feb 09 '21 at 18:14

score 84 · Answer 2 · edited Aug 04 '20 at 14:32

84

All of the current answers still have quirks, the best thing I could come up with was:

string.replace(/[^A-Za-z0-9]/g, '');

Here's an example that captures every key I could find on the keyboard:

var string = '123abcABC-_*(!@#$%^&*()_-={}[]:\"<>,.?/~`';
var stripped = string.replace(/[^A-Za-z0-9]/g, '');
console.log(stripped);

Outputs: '123abcABC'.

edited Aug 04 '20 at 14:32

HoldOffHunger

18,769
10
104
133

answered Aug 24 '15 at 22:10

Deminetix

2,866
26
21

2

`input.replace(/\W/g, '')` leaves in the `_` in a String. @Deminetix is right `string.replace(/[^A-Za-z0-9]/g, '');` works better as it removes all non-alphanumeric chars from the String. – Tim Feb 13 '16 at 01:52
1

And yet, no permutation of this answer _actually answers the question asked_. – AD7six Apr 21 '16 at 13:18
' & ' becomes '---' at the mo'. Is there a way to build that check in so it only has one hyphen for multiple replacements next to each other? – v3nt Mar 15 '21 at 17:05

score 11 · Answer 3 · edited Aug 04 '20 at 14:42

11

You can try this regex:

value.replace(/[\W_]/g, '');

edited Aug 04 '20 at 14:42

HoldOffHunger

18,769
10
104
133

answered Dec 17 '15 at 03:55

myrcutio

1,045
9
13

score 11 · Answer 4 · edited May 14 '16 at 14:06

The problem is not with how you replace the characters, the problem is with how you input the string.

It's only the first backslash in the input that is a backslash character, the others are part of the control characters \r, \b, \f and \n.

As those backslashes are not separate characters, but part of the notation to write a single control characters, they can't be removed separately. I.e. you can't remove the backslash from \n as it's not two separate characters, it's the way that you write the control character LF, or line feed.

If you acutally want to turn that input into the desired output, you would need to replace each control character with the corresponding letter, e.g. replace the character \n with the character n.

To replace a control character you need to use a character set like [\r], as \r has a special meaning in a regular expression:

var input = "\\test\red\bob\fred\new";

var output = input
    .replace(/[\r]/g, 'r')
    .replace(/[\b]/g, 'b')
    .replace(/[\f]/g, 'f')
    .replace(/[\n]/g, 'n')
    .replace(/\\/g, '');

Demo: http://jsfiddle.net/SAp4W/

I understand everything you are saying but the question still stands and no one has suggested the correct answer yet. The input can be changes but no one has suggest an answer as to how to programmatically change it in JS. — Bobby Cannon, Feb 20 '12 at 17:46
@BobbyCannon: I added code that takes your exact input and produces the desired output. — Guffa, Feb 20 '12 at 18:07

Ledorub · Answer 5 · 2023-02-05T16:16:52.020

You can use \p{L} or \p{Letter} to find letters from any language and \d to find digits.

str.replace(/[^\p{L}\d]/gu, '')

^ to negate character set: not \P{L} and not \d

Flags:

g (global) to perform as many replacements as necessary
u (unicode) to recognize Unicode escape sequences (like \p{L}).

Example:

function removeNonAlphaNumeric(str) {
  return str.replace(/[^\p{L}\d]/gu, '')
}

sequences = [
  'asdé5kfjdk?',
  'uQjoFß^ßI$jI',
  '无论3如何?!',
  'фв@#ео1'
]

for (seq of sequences) {
  console.log(removeNonAlphaNumeric(seq))
}

score 5 · Answer 6 · answered Jan 31 '21 at 12:42

5

To include Arabic letters alongside with English letters, you can use:

// Output: نصعربي
"ن$%^&*(ص ع___ربي".replace(/[^0-9a-z\u0600-\u06FF]/gi, '');

answered Jan 31 '21 at 12:42

Abdulrahman Hashem

1,481
1
15
20

Could you elaborate on the inclusion of the code points and why this works? It could be useful more complete – Lex Aug 04 '21 at 20:36

score 3 · Answer 7 · edited Feb 07 '22 at 20:59

3

Here is an example that you can use,

function removeNonAlphaNumeric(str){
    return str.replace(/[\W_]/g,"");
}

removeNonAlphaNumeric("0_0 (: /-\ :) 0-0");

edited Feb 07 '22 at 20:59

rassar

5,412
3
25
41

answered Apr 21 '20 at 03:12

ravi kishore

41
2

Roman · Answer 8 · 2021-06-17T19:14:40.683

2

If you have the case of another language in addition to English you need to add the relative block range from unicode. Here is an example for Cyrillic:

.replace(/[^0-9A-Za-z_\u0400-\u04FF]/gi, '')

edited Jun 17 '21 at 19:14

answered Jun 01 '21 at 20:15

Roman

19,236
15
93
97

Flavio · Answer 9 · 2019-01-08T02:34:35.350

This removes all non-alphanumeric characters, preserves capitalization, and preserves spaces between words.

function alpha_numeric_filter (string) {

  const alpha_numeric = Array.from('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789' + ' ')

  const json_string = JSON.stringify(string)

  let filterd_string = ''

  for (let i = 0; i < json_string.length; i++) {

    let char = json_string[i]
    let index = alpha_numeric.indexOf(char)
    if (index > -1) {
      filterd_string += alpha_numeric[index]
    }

  }

  return filterd_string

}

const input = "\\test\red\bob\fred\new"
console.log(alpha_numeric_filter(input)) //=> testredbobfrednew

const complex_string = "/_&_This!&!! is!@#$% a%^&*() Sentence+=-[]{} 123:;\|\\]||~`/.,><"
console.log(alpha_numeric_filter(complex_string)) //=> This is a Sentence 123

@AD7six thank you for pointing out my error. When I copy pasted the input into WebStrom it automatically added 2 extra backslashes to each existing backslash. I failed to noitice this. input = "\\test\red\bob\fred\new" --> copy_paste = "\\\\test\\red\\bob\\fred\\new". — Flavio, Jan 08 '19 at 02:16

score -3 · Answer 10 · answered Feb 20 '12 at 16:22

-3

If you want to have this \\test\red\bob\fred\new string, you should escape all backslashes (\). When you write \\test\\red\\bob\\fred\\new your string actually contains single backslashes. You can be sure of this printing your string.
So if backslashes in your string are escaped myString.replace(/\W/g,'') will work normally.

answered Feb 20 '12 at 16:22

shift66

11,760
13
50
83

1

If you want to suggest to "you should escape all backslashes (\)" then you need to provide an example on how to do it. – Bobby Cannon Feb 20 '12 at 17:49
What do you thing are double backslashes??? and what I mean saying "When you write \\test\\red\\bob\\fred\\new your string actually contains single backslashes." ??? Is this not explaining? – shift66 Feb 20 '12 at 17:52
The input is "\\test\red\bod\fred\new" and cannot change. I need a solution for that input string. If you want to show me how to "excape the backslashes" then give an example. We cannot change the input. See the accepted answer. The solution allowed for the input to not change but gave the desired output. – Bobby Cannon Feb 20 '12 at 17:55

Remove not alphanumeric characters from string

10 Answers10

Removing non-alphanumeric chars

The input is malformed

Handling malformed strings

Linked

Related