-4

I'm trying to remove strings with unrecognized characters from string collection. What is the best way to accomplish this?

Michael Sallmen
  • 728
  • 5
  • 13
Rade Milovic
  • 965
  • 4
  • 13
  • 29
  • 2
    For example? How do you define "unrecognized characters"? – Oded Oct 24 '12 at 20:10
  • Characters that are not recognized are marked with diamond shape with "?" inside. I asume those characters are unicode formated, and ascii formation can't recognize them. – Rade Milovic Oct 24 '12 at 20:24

5 Answers5

1

Since Array (assuming string[]) is not re-sized when removing items you will need to create new one anyway. So basic LINQ filtering with ToArray() will give you new array.

myArray = myArray.Where(s => !ContainsSpecialCharacters(s)).ToArray();
Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179
1

To remove strings that contain any characters you don't recognize: (EG: if you want to accept lowercase letters, then "foo@bar" would be rejected")

  1. Create a regular expression which defines the set of "recognized" characters, and starts with ^ and ends with $. For example, if your "recognized" characters are uppercase A through Z, it would be ^[A-Z]$
  2. Reject strings that don't match

Note: This won't work for strings that contain newlines, but you can tweak it if you need to support that

To remove strings that contain entirely characters you don't recognize: (EG: If you want to accept lowercase letters, then "foo@bar" would be accepted because it does contain at least one lowercase letter)

  1. Create a regular expression which defines the set of "recognized" characters, but with a ^ character inside the square brackets, and starts with ^ and ends with $. For example, if your "recognized" characters are uppercase A through Z, it would be ^[^A-Z]$
  2. Reject strings that DO match
Orion Edwards
  • 121,657
  • 64
  • 239
  • 328
0

I would look at Linq's where method, along with a regular expression containing the characters you're looking for. In pseudocode:

return myStringCollection.Where(!s matches regex)
jtheis
  • 916
  • 3
  • 12
  • 28
0

this does what you seem to want.

List<string> strings = new List<string>()
{
    "one",
    "two`",
    "thr^ee",
    "four"
};

List<char> invalid_chars = new List<char>()
{
    '`', '-', '^'
};

strings.RemoveAll(s => s.Any(c => invalid_chars.Contains(c)));
strings.ForEach(s => Console.WriteLine(s));

generates output:

one
four
Mike Corcoran
  • 14,072
  • 4
  • 37
  • 49
0

This question has some similar answers to what I think you are looking for. However, I think you want to include all letters, numbers, whitespace and punctuation, but exclude everything else. Is that accurate? If so, this should do it for you:

char[] arr = str.ToCharArray();

arr = Array.FindAll<char>(arr, (c => (char.IsLetterOrDigit(c) || 
                      char.IsWhiteSpace(c) || char.IsPunctuation(c))));
str = new string(arr);
Community
  • 1
  • 1
davehale23
  • 4,374
  • 2
  • 27
  • 40