7

I want to remove all non letter characters from a string. When I say all letters I mean anything that isn't in the alphabet, or an apostrophe. This is the code I have.

public static string RemoveBadChars(string word)
{
    char[] chars = new char[word.Length];
    for (int i = 0; i < word.Length; i++)
    {
        char c = word[i];

        if ((int)c >= 65 && (int)c <= 90)
        {
            chars[i] = c;
        }
        else if ((int)c >= 97 && (int)c <= 122)
        {
            chars[i] = c;
        }
        else if ((int)c == 44)
        {
            chars[i] = c;
        }
    }

    word = new string(chars);

    return word;
}

It's close, but doesn't quite work. The problem is this:

[in]: "(the"
[out]: " the"

It gives me a space there instead of the "(". I want to remove the character entirely.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
jack3604
  • 93
  • 1
  • 1
  • 4
  • Very similar to: http://stackoverflow.com/questions/3210393/how-do-i-remove-all-non-alphanumeric-characters-from-a-string-except-dash – Mephy Dec 30 '14 at 02:38
  • you have space(null character) because the element in chars[] is zero if it is bad char. I think you need to shrink the string based on how many bad chars you have – V-SHY Dec 30 '14 at 02:45

6 Answers6

10

The Char class has a method that could help out. Use Char.IsLetter() to detect valid letters (and an additional check for the apostrophe), then pass the result to the string constructor:

var input = "(the;':";

var result = new string(input.Where(c => Char.IsLetter(c) || c == '\'').ToArray());

Output:

the'

Grant Winney
  • 65,241
  • 13
  • 115
  • 165
6

You should use Regular Expression (Regex) instead.

public static string RemoveBadChars(string word)
{
    Regex reg = new Regex("[^a-zA-Z']");
    return reg.Replace(word, string.Empty);
}

If you don't want to replace spaces:

Regex reg = new Regex("[^a-zA-Z' ]");
Dan
  • 971
  • 1
  • 8
  • 22
  • I've seen this before, but to be honest, I have no idea how that works and that's kind of why I avoided it. What does "[^a-zA-Z]" mean? To me it looks like senseless numbers, but the mean something and I can't figure it out. – jack3604 Dec 30 '14 at 02:46
  • how about apostrophe? – V-SHY Dec 30 '14 at 02:47
  • If you click on the link in my answer, you can see explanation to all the regular expression operators. @V-SHY Opps, didn't read his question carefully enough, changed my answer. – Dan Dec 30 '14 at 02:50
2
private static Regex badChars = new Regex("[^A-Za-z']");

public static string RemoveBadChars(string word)
{
    return badChars.Replace(word, "");
}

This creates a Regular Expression that consists of a character class (enclosed in square brackets) that looks for anything that is not (the leading ^ inside the character class) A-Z, a-z, or '. It then defines a function that replaces anything that matches the expression with an empty string.

Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
2

A regular expression would be better as this is pretty inefficient, but to answer your question, the problem with your code is that you should use a different variable other than i inside your for loop. So, something like this:

public static string RemoveBadChars(string word)
{
    char[] chars = new char[word.Length];
    int myindex=0;
    for (int i = 0; i < word.Length; i++)
    {
        char c = word[i];

        if ((int)c >= 65 && (int)c <= 90)
        {
            chars[myindex] = c;
            myindex++;
        }
        else if ((int)c >= 97 && (int)c <= 122)
        {
            chars[myindex] = c;
            myindex++;
        }
        else if ((int)c == 44)
        {
            chars[myindex] = c;
            myindex++;
        }
    }

    word = new string(chars);

    return word;
}
Brandon Spilove
  • 1,529
  • 1
  • 10
  • 15
  • Thanks, I know that I could use Regex, but I was trying to do it without it, plus I don't understand Regex at all. – jack3604 Dec 30 '14 at 02:52
  • When there are characters to remove, then this function leaves null at the end. I think you should do something like this: word = new string(chars, 0, myindex); – Jose M. Jan 27 '20 at 08:50
2

This is the working answer, he says he want to remove none-letters chars

public static string RemoveNoneLetterChars(string word)
{
    Regex reg = new Regex(@"\W");
    return reg.Replace(word, " "); // or return reg.Replace(word, String.Empty); 
}
Adel Mourad
  • 1,351
  • 16
  • 13
0
word.Aggregate(new StringBuilder(word.Length), (acc, c) => acc.Append(Char.IsLetter(c) ? c.ToString() : "")).ToString();

Or you can substitute whatever function in place of IsLetter.

Richard Keene
  • 398
  • 3
  • 14