765

How do I remove all non alphanumeric characters from a string except dash and space characters?

Codeman
  • 12,157
  • 10
  • 53
  • 91
Luke101
  • 63,072
  • 85
  • 231
  • 359

13 Answers13

1072

Replace [^a-zA-Z0-9 -] with an empty string.

Regex rgx = new Regex("[^a-zA-Z0-9 -]");
str = rgx.Replace(str, "");
Amarghosh
  • 58,710
  • 11
  • 92
  • 121
  • 100
    Worth mentioning that `-` must be at the end of the character class, or escaped with a backslash, to prevent being used for a range. – Peter Boughton Jul 09 '10 at 09:18
  • I am using classic ASP (not C# as the original question is tagged) - if I enter: 50025454$ this works fine but if I enter 50025454$$ this fails. (I need to add + to the regex). Is this the same in C#? – Dan Sep 21 '10 at 15:20
  • That said - using Regex rgx = new Regex("[^a-zA-Z0-9 -]+") and trying 5002$5454$ still fails. – Dan Sep 21 '10 at 15:22
  • 6
    @Dan set the global flag in your regex - without that, it just replaces the first match. A quick google should tell you how to set global flag in classic ASP regex. Otherwise, look for a `replaceAll` function instead of `replace`. – Amarghosh Sep 22 '10 at 03:49
  • 24
    Here's a regex compiled version: `return Regex.Replace(str, "[^a-zA-Z0-9_.]+", "", RegexOptions.Compiled);` [Same basic question](http://stackoverflow.com/questions/1120198/most-efficient-way-to-remove-special-characters-from-string/1120248#1120248) – Paige Watson Sep 30 '11 at 16:35
  • 1
    @Amarghosh: Hi, just a friendly suggestion that string.Empty would be preferable to using "" in the replace function. – Brian Scott Oct 18 '11 at 14:35
  • 2
    \w should be used for alphanumeric characters. a-z will not match diacritic characters, while \w will. – valentinas Apr 11 '12 at 00:29
  • 2
    The regex solution is slower than the code solution shown below. – mas_oz2k1 May 03 '12 at 12:07
  • 17
    @MGOwen because every time you use "" you are creating a new object due to strings being immutable. When you use string.empty you are reusing the single instance required for representing an empty string which is quicker as well as being more efficient. – Brian Scott Jun 18 '12 at 11:09
  • 19
    @BrianScott I know this is old, but was found in a search so I feel this is relevant. This actually depends on the version of .NET you are running under. > 2.0 uses `""` & `string.Empty` exactly the same. http://stackoverflow.com/questions/151472/what-is-the-difference-between-string-empty-and-empty-string – Jared Oct 23 '12 at 21:08
  • 1
    @valentinas, \w would include underscore and 9 other punctuation connectors, which OP doesn't want: http://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx#WordCharacter – Christopher Apr 06 '14 at 18:02
  • I added as extension method: public static string StripNonAlphaNumeric(this string str) { Regex rgx = new Regex("[^a-zA-Z0-9 -]"); return rgx.Replace(str, string.Empty); } – Rob Sedgwick Jan 29 '15 at 14:45
  • @BrianScott Old but `unsafe { fixed (char* stringEmpty = String.Empty) fixed (char* emptyString = "") Debug.Assert(stringEmpty == emptyString); }` – Tom Blodget Dec 02 '15 at 03:16
  • 1
    I use `[^a-zA-Z0-9çşöüğıiIİ -]` if I don't want Turkish characters to be removed. – Gokhan Kurt May 09 '16 at 08:45
  • @Amarghosh I do not want to remove space from string. what i need to add in given regex. – Satish Singh Nov 22 '16 at 09:54
  • What if I wanted to keep the new line instead of space and '-'. I tried, `regex.replace(text,"[^0-9\n]","")` . But it is not working. – Mohammed Julfikar Ali Mahbub Oct 24 '17 at 18:57
  • @MohammadZulfikar Please explain what you mean by not working. What happens? Also, you'd get better results if you ask it as a separate question. You might want to read about the multiline flag of regex. – Amarghosh Oct 26 '17 at 15:06
  • Its ok I solved the issue! and Yeah I already asked my own question. Thnx for ur interest in helping :) – Mohammed Julfikar Ali Mahbub Oct 26 '17 at 15:08
392

I could have used RegEx, they can provide elegant solution but they can cause performane issues. Here is one solution

char[] arr = str.ToCharArray();

arr = Array.FindAll<char>(arr, (c => (char.IsLetterOrDigit(c) 
                                  || char.IsWhiteSpace(c) 
                                  || c == '-')));
str = new string(arr);

When using the compact framework (which doesn't have FindAll)

Replace FindAll with1

char[] arr = str.Where(c => (char.IsLetterOrDigit(c) || 
                             char.IsWhiteSpace(c) || 
                             c == '-')).ToArray(); 

str = new string(arr);

1 Comment by ShawnFeatherly

Community
  • 1
  • 1
ata
  • 8,853
  • 8
  • 42
  • 68
  • 48
    in my testing, this technique was much faster. to be precise, it was just under 3 times faster than the Regex Replace technique. – Dan Aug 11 '11 at 15:49
  • 14
    The compact framework doesn't have FindAll, you can replace FindAll with `char[] arr = str.Where(c => (char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-')).ToArray();` – ShawnFeatherly Jan 25 '13 at 22:14
  • 2
    has anyone tested this? That didn't work at all. --but this did for me: string str2 = new string(str.Where(c => (char.IsLetterOrDigit(c))).ToArray()); – KevinDeus Dec 16 '16 at 21:31
  • As a single line `str = string.Concat(str.Where(c => Char.IsLetterOrDigit(c) || Char.IsWhiteSpace(c)))` – VDWWD May 11 '21 at 15:33
  • You present `.Where` as being a bit of a last resort if `Array.FindAll` isn't available, but it seems quite a bit simpler to me. Is there any reason you prefer `FindAll`? – Arthur Tacca Jul 10 '23 at 10:11
79

You can try:

string s1 = Regex.Replace(s, "[^A-Za-z0-9 -]", "");

Where s is your string.

miken32
  • 42,008
  • 16
  • 111
  • 154
josephj1989
  • 9,509
  • 9
  • 48
  • 70
51

Using System.Linq

string withOutSpecialCharacters = new string(stringWithSpecialCharacters.Where(c =>char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-').ToArray());
w.b
  • 11,026
  • 5
  • 30
  • 49
Zain Ali
  • 15,535
  • 14
  • 95
  • 108
  • @Michael It is similar but at least this is a one liner, rather than 3 lines. I'd say that's enough to make it a different answer. – Dymas Apr 25 '19 at 17:37
  • 1
    @Dymas I now agree that it is acceptable, but **not** because the whitespace is different. Apparently the part that is functionally equivalent (only var names differ) was edited in after this answer was written. – Michael Apr 25 '19 at 18:26
  • 1
    @ZainAli, if you make a trivial edit and ping me, I'll reverse my downvote. I apologize for any insinuation of plagiary. – Michael Apr 25 '19 at 18:28
31

The regex is [^\w\s\-]*:

\s is better to use instead of space (), because there might be a tab in the text.

True Soft
  • 8,675
  • 6
  • 54
  • 83
  • 2
    unless you want to remove tabs. – Matt Ellen Jul 09 '10 at 06:57
  • ...and newlines, and all other characters considered "whitespace". – Peter Boughton Jul 09 '10 at 09:17
  • 9
    This solution is far superior to the above solutions since it also supports international (non-English) characters. string s = "Mötley Crue 日本人: の氏名 and Kanji 愛 and Hiragana あい"; string r = Regex.Replace(s,"[^\\w\\s-]*",""); The above produces r with: Mötley Crue 日本人 の氏名 and Kanji 愛 and Hiragana あい – Dan Gøran Lunde Feb 03 '14 at 12:12
  • 2
    Use @ to escape \ conversion in string: @"[^\w\s-]*" – Jakub Pawlinski Feb 28 '14 at 11:45
  • @danglund Those are alphanumeric characters? – minexew Apr 24 '15 at 06:13
  • 1
    it, uhhh... doesn't remove underscores? that is considered a "word" character by regex implementation across creation, but it's not alphanumeric, dash, or space... (?) – Code Jockey Nov 16 '15 at 14:22
  • Actually this solution is not superior to the second-highest voted solution, which is much better for a performance-critical task, such as a server API that must ensure special characters are not in a string, that could be used thousands of times per minute (or more). – Codefun64 Nov 19 '15 at 17:39
  • @CodeJockey is absolutely correct. `var s = "_"; Console.WriteLine(Regex.IsMatch(s, @"\w"));` Ironically, I recently saw something exactly like this on a project I'm working on where the developer created a **redundant** `Regex` using a pattern like `[\w_]`. – kuujinbo Jan 17 '16 at 21:56
  • This solution didn't work for me. It will remove some stuff, but several non-alphanumeric characters are not removed (example I just found: the Δ symbol, but I'm sure there are many more). It's weird because "\w" sounds equivalent ot "A-Za-z0-9" according to the w3schools definition, but it seems that's not the case? – Master_T Aug 27 '19 at 13:09
25

Based on the answer for this question, I created a static class and added these. Thought it might be useful for some people.

public static class RegexConvert
{
    public static string ToAlphaNumericOnly(this string input)
    {
        Regex rgx = new Regex("[^a-zA-Z0-9]");
        return rgx.Replace(input, "");
    }

    public static string ToAlphaOnly(this string input)
    {
        Regex rgx = new Regex("[^a-zA-Z]");
        return rgx.Replace(input, "");
    }

    public static string ToNumericOnly(this string input)
    {
        Regex rgx = new Regex("[^0-9]");
        return rgx.Replace(input, "");
    }
}

Then the methods can be used as:

string example = "asdf1234!@#$";
string alphanumeric = example.ToAlphaNumericOnly();
string alpha = example.ToAlphaOnly();
string numeric = example.ToNumericOnly();
Ppp
  • 1,015
  • 9
  • 14
18

Want something quick?

public static class StringExtensions 
{
    public static string ToAlphaNumeric(this string self,
                                        params char[] allowedCharacters)
    {
        return new string(Array.FindAll(self.ToCharArray(),
                                        c => char.IsLetterOrDigit(c) ||
                                        allowedCharacters.Contains(c)));
    }
}

This will allow you to specify which characters you want to allow as well.

Andreas
  • 5,393
  • 9
  • 44
  • 53
  • IMHO - the best solution here. – suchoss Sep 25 '20 at 18:48
  • Looks clean, but a bit hard to specify how to add white space ? I would have added another overload which allows whitespace too as this method works fine on words, but not sentences or other whitespace such as newlines or tabs. +1 anyways, good solution. public static string ToAlphaNumericWithWhitespace(this string self, params char[] allowedCharacters) { return new string(Array.FindAll(self.ToCharArray(), c => char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || allowedCharacters.Contains(c))); } – Tore Aurstad Aug 26 '21 at 12:15
7

Here is a non-regex heap allocation friendly fast solution which was what I was looking for.

Unsafe edition.

public static unsafe void ToAlphaNumeric(ref string input)
{
    fixed (char* p = input)
    {
        int offset = 0;
        for (int i = 0; i < input.Length; i++)
        {
            if (char.IsLetterOrDigit(p[i]))
            {
                p[offset] = input[i];
                offset++;
            }
        }
        ((int*)p)[-1] = offset; // Changes the length of the string
        p[offset] = '\0';
    }
}

And for those who don't want to use unsafe or don't trust the string length hack.

public static string ToAlphaNumeric(string input)
{
    int j = 0;
    char[] newCharArr = new char[input.Length];

    for (int i = 0; i < input.Length; i++)
    {
        if (char.IsLetterOrDigit(input[i]))
        {
            newCharArr[j] = input[i];
            j++;
        }
    }

    Array.Resize(ref newCharArr, j);

    return new string(newCharArr);
}
BjarkeCK
  • 5,694
  • 5
  • 41
  • 59
4

I´ve made a different solution, by eliminating the Control characters, which was my original problem.

It is better than putting in a list all the "special but good" chars

char[] arr = str.Where(c => !char.IsControl(c)).ToArray();    
str = new string(arr);

it´s simpler, so I think it´s better !

th1rdey3
  • 4,176
  • 7
  • 30
  • 66
Pinello
  • 41
  • 1
3

Here's an extension method using @ata answer as inspiration.

"hello-world123, 456".MakeAlphaNumeric(new char[]{'-'});// yields "hello-world123456"

or if you require additional characters other than hyphen...

"hello-world123, 456!?".MakeAlphaNumeric(new char[]{'-','!'});// yields "hello-world123456!"


public static class StringExtensions
{   
    public static string MakeAlphaNumeric(this string input, params char[] exceptions)
    {
        var charArray = input.ToCharArray();
        var alphaNumeric = Array.FindAll<char>(charArray, (c => char.IsLetterOrDigit(c)|| exceptions?.Contains(c) == true));
        return new string(alphaNumeric);
    }
}
Aaron Hudon
  • 5,280
  • 4
  • 53
  • 60
0

I use a variation of one of the answers here. I want to replace spaces with "-" so its SEO friendly and also make lower case. Also not reference system.web from my services layer.

private string MakeUrlString(string input)
{
    var array = input.ToCharArray();

    array = Array.FindAll<char>(array, c => char.IsLetterOrDigit(c) || char.IsWhiteSpace(c) || c == '-');

    var newString = new string(array).Replace(" ", "-").ToLower();
    return newString;
}
Philip Johnson
  • 1,091
  • 10
  • 24
0

If you are working in JS, here is a very terse version

myString = myString.replace(/[^A-Za-z0-9 -]/g, "");
Jeff
  • 8,020
  • 34
  • 99
  • 157
GeekyMonkey
  • 12,478
  • 6
  • 33
  • 39
-1

There is a much easier way with Regex.

private string FixString(string str)
{
    return string.IsNullOrEmpty(str) ? str : Regex.Replace(str, "[\\D]", "");
}
astef
  • 8,575
  • 4
  • 56
  • 95