2

EDIT: I changed the title to reflect specifically what it is I'm trying to do.

Is there a way to retrieve all alphanumeric (or preferably, just the alphabet) characters for the current culture in .NET? My scenario is that I have several strings that I need to remove all numerals and non-alphabet characters from, and I'm not quite sure how I would implement this while honoring the alphabet of languages other than English (short of creating arrays of all alphabet characters for all supported languages of .NET, or at least the languages of our current clients lol)

UPDATE:

Specifically, what I'm trying to do is trim all non-alphabet chars from the start of the string up until the first alphabet character, and then from the last alphabet character to the end of the string. So for a random example in en-US, I want to turn:

()&*1@^#47*^#21%Littering aaaannnnd(*&^1#*32%#**)7(#9&^

into the following:

Littering aaaannnnd

This would be simple enough to do for English since it's my first language, but really in any culture I need to be able to remove numerals and other non-alphanumeric characters from the string.

codewario
  • 19,553
  • 20
  • 90
  • 159

3 Answers3

1
   string something = "()&*1@^#47*^#21%Littering aaaannnndóú(*&^1#*32%#**)7(#9&^";
   string somethingNew = Regex.Replace(something, @"[^\p{L}-\s]+", "");

Is this what you're looking for?

Edit: Added to allow other languages characters. This will output Littering aaaannnndóú

Kevin DeVoe
  • 600
  • 2
  • 8
  • 1
    Just noticed you said anything before and after. This example will strip out all illegal characters in between as well. So if Littering aaaannnnd was Li@34tterin 98#45 aaaann$45)nnd it would still come out Littering aaaannnd... Not sure if that will work for you. – Kevin DeVoe Jun 18 '13 at 15:07
  • 1
    He said he wanted a solution that works in all cultures not just for US-English. So imagine he wanted Cyrillic characters to be ok in Russian culture, French characters in French, etc. – Shlomo Jun 18 '13 at 15:13
  • Thanks Shlomo, I've updated my answer to accept other languages characters. – Kevin DeVoe Jun 18 '13 at 15:25
  • I'm very fuzzy/green/don't really know much about regular expressions at all. Would you mind explaining how this expression works? – codewario Jun 18 '13 at 15:30
  • 1
    \p{L} or \p{Letter}: any kind of letter from any language. \s: space. Putting it in [^...] means it will match anything in the ... area – Kevin DeVoe Jun 18 '13 at 15:34
  • Does this work with languages that use symbols (e.g. Chinese, Japanese) rather than letters (e.g. English, German, French)? – codewario Jun 18 '13 at 15:42
  • I've never used it in a production environment with Japanese, Korean characters, ect... but I just tested it with a few Japanese characters and it kept them in the string. – Kevin DeVoe Jun 18 '13 at 17:27
  • 1
    This is not correct. This will remove non-alpha and numeric characters from the middle of the string also. The OP clearly states both in the title and in the body of the question that it should only remove those characters from the start and end of the string. – rory.ap May 02 '16 at 16:53
1

Using regex method, this should work out:

string input = "()&*1@^#47*^#21%Littering aaaannnnd(*&^1#*32%#**)7(#9&^";
string result = Regex.Replace(input, "(?:^[^a-zA-Z]*|[^a-zA-Z]*$)", ""); //TRIM FROM START & END
dotINSolution
  • 224
  • 1
  • 5
0

Without using regex: In Java, you could do:

while (true) {
    if (word.length() == 0) {
        return ""; // bad
    }

    if (!Character.isLetter(word.charAt(0))) {
        word = word.substring(1);
        continue; // so we are doing front first
    }
    if (!Character.isLetter(word.charAt(word.length()-1))) {
        word = word.substring(0, word.length()-1);
        continue; // then we are doing end
    }
    break; // if front is done, and end is done
}

If you are using something else, then java, substituting Character.isLetter is very straight forward, just search for character encoding and you will find the integer values for alphabetic characters, and you can use that to do it.

Snowman
  • 1,503
  • 1
  • 17
  • 39