0

I have a pattern which looks like this (I'm not the creator):

[a-z,A-z,0-9,\-,\+,\&,\/,\\\,\s]{1,127}

which i pass to Regex.IsMatch().

Is there a "better" way of writing the same expression? And by better I mean shorter.

And if I would like to add a special character like æ, do I simply add ?

Johan
  • 35,120
  • 54
  • 178
  • 293
  • That regular expression is .. dubious. Anyway, it is equivalent to: `[-.,+&/\\\sA-z]{1,127}`. 1) Removed duplicates (which are meaningless in a character class) 2) Put - at start (which removes it's special meaning) 3) Removed unnecessary escapes 4) Removed range overlap (A-z is different than A-Z and overlaps a-z). – user2246674 Jun 17 '13 at 17:31
  • 2
    That's a bizarre Regex. It duplicates `.` and `,` multiple times, and I'm not sure if those characters were intended to be allowed. Also `A-z` gets some unintended characters, or maybe intended characters...nobody can know without a comment. – user7116 Jun 17 '13 at 17:32
  • it is a very strange regexp, not an expert in C# but i think that using \ in class `[]` is not necessary unless when escaping `]`or using metaclass like `\s` also the repeated usage of `.` and `,` seems to me to useless unless to facilitate reading. comma are not needed separators in class `[]`... actually unless some kind of strange facilitation for reading... but weird... – user2468222 Jun 17 '13 at 17:33
  • @user7116 The dots should be commas, my bad. – Johan Jun 17 '13 at 17:34
  • @user2468222 I removed the dots. This is what it really looks like. – Johan Jun 17 '13 at 17:35
  • @Johan Then remove the dot in my equivalent rewrite - the other points stand. At this point, I would go back to the *requirements* to see what *should* be matched. – user2246674 Jun 17 '13 at 17:36
  • @user2246674 Ok, is `[-.,+&wæøå/\\\sA-z]{1,127}` correct if I want to add the special characters? – Johan Jun 17 '13 at 17:37
  • @user2246674 don't you miss `a-z` if case if not specifically ignored? edit: ok, i miss the cap A small z... sorry. – user2468222 Jun 17 '13 at 17:37
  • 3
    @user2468222 A-z is covers *all* characters between A and z: [A-Z, \[, \ , ^, _, `, \], a-z](http://www.asciitable.com). The insensitivity is applied during the match after the character class range is built. – user2246674 Jun 17 '13 at 17:38

1 Answers1

1

You can start by removing the duplicates. This includes the repeated commas, as well as the a-z because this is encapsulated in the A-z range, as is the \.

You also do not have to escape most characters, and you can pull the - to the front of the character class to avoid escaping that one too.

This leaves you with:

[-+&/,A-z0-9\s]{1,127}
lc.
  • 113,939
  • 20
  • 158
  • 187
  • Thanks, and if I want to add the characters"æ ø å"? `[-+&/æøå,A-z0-9\s]{1,127}`? – Johan Jun 17 '13 at 17:40
  • @Johan What about other Unicode characters from different languages? – user2246674 Jun 17 '13 at 17:41
  • @user2246674 Is it even possible to make a generic regex for that? – Johan Jun 17 '13 at 17:41
  • 2
    @Johan You may be interested in [Unicode Categories](http://msdn.microsoft.com/en-us/library/20bw873z.aspx#CategoryOrBlock) - i.e. `\p{Lu}` (case insensitive) matches all letters; English, Swedish, or otherwise. – user2246674 Jun 17 '13 at 17:45
  • 3
    @Johan Yes you just add whatever else you want to match into the class, although honestly I think it's time to go back to the *requirement* and see what you're trying to match. – lc. Jun 17 '13 at 17:45
  • @lc. Thank you. Yeah, that will be my next step. – Johan Jun 17 '13 at 17:46