10

I would like to check some string for invalid characters. With invalid characters I mean characters that should not be there. What characters are these? This is different, but I think thats not that importan, important is how should I do that and what is the easiest and best way (performance) to do that?

Let say I just want strings that contains 'A-Z', 'empty', '.', '$', '0-9'

So if i have a string like "HELLO STaCKOVERFLOW" => invalid, because of the 'a'. Ok now how to do that? I could make a List<char> and put every char in it that is not allowed and check the string with this list. Maybe not a good idea, because there a lot of chars then. But I could make a list that contains all of the allowed chars right? And then? For every char in the string I have to compare the List<char>? Any smart code for this? And another question: if I would add A-Z to the List<char> I have to add 25 chars manually, but these chars are as I know 65-90 in the ASCII Table, can I add them easier? Any suggestions? Thank you

silla
  • 1,257
  • 4
  • 15
  • 32
  • 1
    You could use your idea of list of chars and then use indexof of string to do so, or use regex. – Ademar Sep 10 '12 at 11:33
  • 1
    Please ask only one question in one SO question. IF you have two questions, ask two SO questions. Thank you. – O. R. Mapper Sep 10 '12 at 11:33

6 Answers6

21

You can use a regular expression for this:

Regex r = new Regex("[^A-Z0-9.$ ]$");
if (r.IsMatch(SomeString)) {
    // validation failed
}

To create a list of characters from A-Z or 0-9 you would use a simple loop:

for (char c = 'A'; c <= 'Z'; c++) {
    // c or c.ToString() depending on what you need
}

But you don't need that with the Regex - pretty much every regex engine understands the range syntax (A-Z).

ThiefMaster
  • 310,957
  • 84
  • 592
  • 636
  • ah thats maybe much easier then to compare a list. Good idea – silla Sep 10 '12 at 11:34
  • One question to this. Should there not be if (!r.IsMatching(Something)) => validation failed? Because if its matching validation is fine, or? – silla Sep 10 '12 at 12:15
  • 1
    No, the regex matches if the string contains any character that is *not* in the `[A-Z0-9.$ ]` range - that's much more efficient since the regex engine can stop as soon as it found such a character. – ThiefMaster Sep 10 '12 at 12:26
  • Ah I was confused because never used regex in C#. So the '^' at the start of the regex is something like "!" (not) – silla Sep 10 '12 at 12:50
  • 1
    Yes, it negates the character class. – ThiefMaster Sep 10 '12 at 13:07
0

I have only just written such a function, and an extended version to restrict the first and last characters when needed. The original function merely checks whether or not the string consists of valid characters only, the extended function adds two integers for the numbers of valid characters at the beginning of the list to be skipped when checking the first and last characters, in practice it simply calls the original function 3 times, in the example below it ensures that the string begins with a letter and doesn't end with an underscore.

StrChr(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"));
StrChrEx(String, "_0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ", 11, 1));


BOOL __cdecl StrChr(CHAR* str, CHAR* chars)
{
 for (int s = 0; str[s] != 0; s++)
 {
     int c = 0;

    while (true)
    {
        if (chars[c] == 0)
        {
             return false;
        }
         else if (str[s] == chars[c])
         {
            break;
         }
        else
         {
            c++;
         }
     }
 }

return true;
}

BOOL __cdecl StrChrEx(CHAR* str, CHAR* chars, UINT excl_first, UINT excl_last)
{
char first[2]   = {str[0], 0};
char last[2]    = {str[strlen(str) - 1], 0};

if (!StrChr(str, chars))
{
    return false;
}

if (excl_first != 0)
{
    if (!StrChr(first, chars + excl_first))
    {
        return false;
    }
}

if (excl_last != 0)
{
    if (!StrChr(last, chars + excl_last))
    {
        return false;
    }
}

return true;
}
0

If you are using c#, you do this easily using List and contains. You can do this with single characters (in a string) or a multicharacter string just the same

  var pn = "The String To ChecK";      
  var badStrings = new List<string>()
  {
  " ","\t","\n","\r"
  };
  foreach(var badString in badStrings)
  {
    if(pn.Contains(badString))
    {
     //Do something
    }
  }
0

If you're not super good with regular expressions, then there is another way to go about this in C#. Here is a block of code I wrote to test a string variable named notifName:

var alphabet = "a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z";
var numbers = "0,1,2,3,4,5,6,7,8,9";
var specialChars = " ,(,),_,[,],!,*,-,.,+,-";
var validChars = (alphabet + "," + alphabet.ToUpper() + "," + numbers + "," + specialChars).Split(',');
for (int i = 0; i < notifName.Length; i++)
{
    if (Array.IndexOf(validChars, notifName[i].ToString()) < 0) {
        errorFound = $"Invalid character '{notifName[i]}' found in notification name.";
        break;
        }
}

You can change the characters added to the array as needed. The Array IndexOf method is the key to the whole thing. Of course if you want commas to be valid, then you would need to choose a different split character.

0

Not enough reps to comment directly, but I recommend the Regex approach. One small caveat: you probably need to anchor both ends of the input string, and you will want at least one character to match. So (with thanks to ThiefMaster), here's my regex to validate user input for a simple arithmetical calculator (plus, minus, multiply, divide):

Regex r = new Regex(@"^[0-9\.\-\+\*\/ ]+$");
0

I'd go with a regex, but still need to add my 2 cents here, because all the proposed non-regex solutions are O(MN) in the worst case (string is valid) which I find repulsive for religious reasons.

Even more so when LINQ offers a simpler and more efficient solution than nesting loops:

var isInvalid = "The String To Test".Intersect("ALL_INVALID_CHARS").Any();
Good Night Nerd Pride
  • 8,245
  • 4
  • 49
  • 65