16

I want to remove all special characters from a string. Allowed characters are A-Z (uppercase or lowercase), numbers (0-9), underscore (_), white space ( ), pecentage(%) or the dot sign (.).

I have tried this:

        StringBuilder sb = new StringBuilder();
        foreach (char c in input)
        {
            if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') | c == '.' || c == '_' || c == ' ' || c == '%')
            { sb.Append(c); }
        }
        return sb.ToString();

And this:

        Regex r = new Regex("(?:[^a-z0-9% ]|(?<=['\"])s)", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.Compiled); 
        return r.Replace(input, String.Empty); 

But nothing seems to be working. Any help will be appreciated.

Thank you!

OBL
  • 1,347
  • 10
  • 24
  • 45
  • 1
    Is this a type? "'z') | c == '.' ||"? – Matt Dawdy Apr 15 '11 at 18:19
  • Thank you so much for all the responses. All of them worked for me. I just realized that I forgot to re-publish and that was causing the issue that it was elmenating the white spaces too. – OBL Apr 15 '11 at 18:50
  • @Matt Dawdy: I think it is a typo - and so is "type" :p – Aasmund Eldhuset Apr 15 '11 at 19:03
  • @Aasmund Eldhuset -- that's freaking funny. At least I wasn't rude about it! :) – Matt Dawdy Apr 17 '11 at 02:56
  • Have you looked at [this](http://stackoverflow.com/questions/1120198/most-efficient-way-to-remove-special-characters-from-string) thread on StackOverflow? This guy has a working implementation that you seem to want. – Tejs Apr 15 '11 at 18:19

7 Answers7

45
Regex.Replace(input, "[^a-zA-Z0-9% ._]", string.Empty)
Sanjeevakumar Hiremath
  • 10,985
  • 3
  • 41
  • 46
16

You can simplify the first method to

StringBuilder sb = new StringBuilder();
foreach (char c in input)
{
    if (Char.IsLetterOrDigit(c) || c == '.' || c == '_' || c == ' ' || c == '%')
    { sb.Append(c); }
}
return sb.ToString();

which seems to pass simple tests. You can shorten it using LINQ

return new string(
    input.Where(
        c => Char.IsLetterOrDigit(c) || 
            c == '.' || c == '_' || c == ' ' || c == '%')
    .ToArray());
Yuriy Faktorovich
  • 67,283
  • 14
  • 105
  • 142
  • 11
    Be careful with `Char.IsLetterOrDigit`, since it considers *all* Unicode letters and digits. So `Char.IsLetterOrDigit('Ѝ')` returns `true`, because that's a letter in the Cyrillic alphabet. – Jim Mischel Apr 15 '11 at 20:38
6

The first approach seems correct, except that you have a | (bitwise OR) instead of a || before c == '.'.

By the way, you should state what doesn't work (doesn't it compile, or does it crash, or does it produce wrong output?)

Aasmund Eldhuset
  • 37,289
  • 4
  • 68
  • 81
3
StringBuilder sb = new StringBuilder();
foreach (char c in input)
{
    if (char.IsLetterOrDigit(c) || "_ %.".Contains(c.ToString()))
        sb.Append(c);
}
return sb.ToString();
Jim Bolla
  • 8,265
  • 36
  • 54
1
private string RemoveReservedCharacters(string strValue)
{
    char[] ReservedChars = {'/', ':','*','?','"', '<', '>', '|'};

    foreach (char strChar in ReservedChars)
    {
        strValue = strValue.Replace(strChar.ToString(), "");
    }
    return strValue;
}
Jake1164
  • 12,291
  • 6
  • 47
  • 64
  • Acceptable solution because of its simplicity. Performance is not that great, because it creates a new string for each replaced character. Why are the ReservedChars not a string[]? Then you wouldn't need to call ToString each time. – Stefan Steinegger Oct 02 '12 at 09:38
1

This is how my version might look.

StringBuilder sb = new StringBuilder();
foreach (char c in input)
{
    if (Char.IsLetterOrDigit(c) ||
        c == '.' || c == '_' || c == ' ' || c == '%')
        sb.Append(c);
    }
}
return sb.ToString();
Jonathan Wood
  • 65,341
  • 71
  • 269
  • 466
  • `IsLetterOrDigit` allows *all* Unicode letters and digits. For example, calling it with 'Ѝ' will return `true`. – Jim Mischel Apr 15 '11 at 20:40
  • @Jim: Yes, I understand that. But while the original question only checked for English alphabet characters, it also didn't address the issue of international characters. It seemed quite possible that `Char.IsLetterOrDigit()` would meet the OP's requirements. – Jonathan Wood Apr 15 '11 at 22:46
1

Cast each char to an int, then compare its ascii code to the ascii table, which you can find all over the internet: http://www.asciitable.com/

    {
        char[] input = txtInput.Text.ToCharArray();
        StringBuilder sbResult = new StringBuilder();

        foreach (char c in input)
        {
            int asciiCode = (int)c;
            if (
                //Space
                asciiCode == 32
                ||
                // Period (.)
                asciiCode == 46
                ||
                // Percentage Sign (%)
                asciiCode == 37
                ||
                // Underscore
                asciiCode == 95
                ||
                ( //0-9, 
                    asciiCode >= 48
                    && asciiCode <= 57
                )
                ||
                ( //A-Z
                    asciiCode >= 65
                    && asciiCode <= 90
                )
                ||
                ( //a-z
                    asciiCode >= 97
                    && asciiCode <= 122
                )
            )
            {
                sbResult.Append(c);
            }
        }

        txtResult.Text = sbResult.ToString();
    }
essedbl
  • 520
  • 1
  • 7
  • 19