Special characters in regex

Question

I have a pattern which looks like this (I'm not the creator):

[a-z,A-z,0-9,\-,\+,\&,\/,\\\,\s]{1,127}

which i pass to Regex.IsMatch().

Is there a "better" way of writing the same expression? And by better I mean shorter.

And if I would like to add a special character like æ, do I simply add \æ?

That regular expression is .. dubious. Anyway, it is equivalent to: `[-.,+&/\\\sA-z]{1,127}`. 1) Removed duplicates (which are meaningless in a character class) 2) Put - at start (which removes it's special meaning) 3) Removed unnecessary escapes 4) Removed range overlap (A-z is different than A-Z and overlaps a-z). — user2246674, Jun 17 '13 at 17:31
That's a bizarre Regex. It duplicates `.` and `,` multiple times, and I'm not sure if those characters were intended to be allowed. Also `A-z` gets some unintended characters, or maybe intended characters...nobody can know without a comment. — user7116, Jun 17 '13 at 17:32
it is a very strange regexp, not an expert in C# but i think that using \ in class `[]` is not necessary unless when escaping `]`or using metaclass like `\s` also the repeated usage of `.` and `,` seems to me to useless unless to facilitate reading. comma are not needed separators in class `[]`... actually unless some kind of strange facilitation for reading... but weird... — user2468222, Jun 17 '13 at 17:33
@user2468222 I removed the dots. This is what it really looks like. — Johan, Jun 17 '13 at 17:35
@Johan Then remove the dot in my equivalent rewrite - the other points stand. At this point, I would go back to the *requirements* to see what *should* be matched. — user2246674, Jun 17 '13 at 17:36
@user2246674 Ok, is `[-.,+&wæøå/\\\sA-z]{1,127}` correct if I want to add the special characters? — Johan, Jun 17 '13 at 17:37
@user2246674 don't you miss `a-z` if case if not specifically ignored? edit: ok, i miss the cap A small z... sorry. — user2468222, Jun 17 '13 at 17:37
@user2468222 A-z is covers *all* characters between A and z: [A-Z, \[, \ , ^, _, `, \], a-z](http://www.asciitable.com). The insensitivity is applied during the match after the character class range is built. — user2246674, Jun 17 '13 at 17:38

score 1 · Accepted Answer · answered Jun 17 '13 at 17:38

1

You can start by removing the duplicates. This includes the repeated commas, as well as the a-z because this is encapsulated in the A-z range, as is the \.

You also do not have to escape most characters, and you can pull the - to the front of the character class to avoid escaping that one too.

This leaves you with:

[-+&/,A-z0-9\s]{1,127}

answered Jun 17 '13 at 17:38

lc.

113,939
20
158
187

Thanks, and if I want to add the characters"æ ø å"? `[-+&/æøå,A-z0-9\s]{1,127}`? – Johan Jun 17 '13 at 17:40
@Johan What about other Unicode characters from different languages? – user2246674 Jun 17 '13 at 17:41
@user2246674 Is it even possible to make a generic regex for that? – Johan Jun 17 '13 at 17:41
2

@Johan You may be interested in [Unicode Categories](http://msdn.microsoft.com/en-us/library/20bw873z.aspx#CategoryOrBlock) - i.e. `\p{Lu}` (case insensitive) matches all letters; English, Swedish, or otherwise. – user2246674 Jun 17 '13 at 17:45
3

@Johan Yes you just add whatever else you want to match into the class, although honestly I think it's time to go back to the *requirement* and see what you're trying to match. – lc. Jun 17 '13 at 17:45
@lc. Thank you. Yeah, that will be my next step. – Johan Jun 17 '13 at 17:46

Special characters in regex

1 Answers1