1

I want to define a 'valid' input, which is _-. won't be allowed in the end or in the begining of the string, only allowed in the middle.

Acceptable characters (location doesn't matter): a-zA-Z0-9 and all the hebrew letters which I don't know how to allow them in a regex (maybe just hard-coding all the letters?)

Unacceptable characters (location doesn't matter): All symbols, except the special ones I provided before.

I don't know how to build this pattern, and if you can add tips and comments on every section so I will understand. Thanks!

This is not for homework, just for self learning.

Matt
  • 74,352
  • 26
  • 153
  • 180
Novak
  • 2,760
  • 9
  • 42
  • 63
  • The permitted symbols, can they be one character away from the ends? Can there be two next to each other? (Would “`A...Z`” be something you'd want valid?) – Donal Fellows Apr 21 '12 at 09:35
  • That's a great point, thanks for pointing that out: `a..b` is invalid, but `'a.b'` is valid. Using the pattern @Yorye provided, how can I apply this setting? Provided Pattern: `@"^[a-zA-Z\dא-ת][\s\w\.א-ת\-]*[a-zA-Z\dא-ת]$"` – Novak Apr 21 '12 at 09:40

1 Answers1

1
@"^[a-zA-Z\dא-ת][a-zA-Z_\-\.\dא-ת]*[a-zA-Z\dא-ת]$"

If you want to allow "_.-" without duplicates:

@"^[a-zA-Z\dא-ת]([a-zA-Z\dא-ת]+[_\.\-]?)*[a-zA-Z\dא-ת]$"

If you want to allow white spaces in the middle:

@"^[a-zA-Z\dא-ת][a-zA-Z_\-\.\d\sא-ת]*[a-zA-Z\dא-ת]$"

If you want white spaces + "_.-" without duplicates:

@"^[a-zA-Z\dא-ת]([a-zA-Z\d\sא-ת]+[_\.\-]?)*[a-zA-Z\dא-ת]$"

So using the Regex:

var isValid = Regex.IsMatch(input, @"...");

Also, if you plan on using the regex many times in the code, I suggest adding RegexOptions.Compiled flag, to increase speed.

var isValid = Regex.IsMatch(input, @"...", RegexOptions.Compiled);
SimpleVar
  • 14,044
  • 4
  • 38
  • 60
  • I have tested your code before, wasn't working, but the new one does. In Regex '\w' defines..? And what does the '*' is used for between the last in middle sections? Thanks for your help. – Novak Apr 21 '12 at 09:34
  • \w stands for "a-zA-Z0-9_", and * means that the [] section before it can appear zero times or more. – SimpleVar Apr 21 '12 at 09:34
  • @GuyDavid I edited it again because I forgot to allow dot in the middle. This is now final. – SimpleVar Apr 21 '12 at 09:36
  • Your description of \w isn't entirely true and it matches a lot more than you might think. The old `[A-Za-z0-9_]` is only true when you're using the ECMAScript compatible version of regex in .NET. By default \w matches a range of characters due to the fact that it uses Unicode matching. See: http://msdn.microsoft.com/en-us/library/20bw873z.aspx#WordCharacter – jessehouwing Apr 21 '12 at 10:33
  • @jessehouwing Thanks for the note! Fixed. – SimpleVar Apr 21 '12 at 10:35
  • @Yorye and if I would like to limit that allowed symbol ('_.-') to appear only once, between letters? – Novak Apr 21 '12 at 11:19
  • @Yorye You have been very helpful for me understanding Regex basis. Thank you – Novak Apr 21 '12 at 12:05
  • @GuyDavid No problem. You can find all regex signs and special chars [here](http://www.mikesdotnetting.com/Article/46/CSharp-Regular-Expressions-Cheat-Sheet). You can add me on [facebook](http://www.facebook.com/yoryenathan) if you want, so you can ask me little questions if you're stuck or if you want to learn something new. – SimpleVar Apr 21 '12 at 12:10