4

Is there any way to define custom character class in C# regex?

In flex it is done in very obvious way:

DIGIT    [0-9]
%%
{DIGIT}+    {printf( "An integer: %s (%d)\n", yytext, atoi( yytext ) );}

http://westes.github.io/flex/manual/Simple-Examples.html#Simple-Examples

As explained in this answer, in PHP defining a custom character class works like this:

(?(DEFINE)(?<a>[acegikmoqstz@#&]))\g<a>(?:.*\g<a>){2}

Is there a way to achieve this result in c#, without repeating the full character class definition each time it is used?

rici
  • 234,347
  • 28
  • 237
  • 341
PiotrB
  • 133
  • 1
  • 10
  • @Rawling: It's same kind of question, but the point is: How to do it (if possible) in C#. – PiotrB Aug 17 '14 at 11:20
  • 1
    Reason for my *reopen vote*: The answer in the linked duplicate **does not address c#** at all, it explicitly only deals with Java and PHP. The solutions presented there are not applicable for c# (@Rawling) – HugoRune Aug 20 '14 at 14:45
  • @HugoRune Good point, I thought both the answer were just language-specific versions of string concatenation but the PHP one is doing something special. There is a [C# specific question here](http://stackoverflow.com/questions/8204214/regex-reusing-subexpressions) and I expect most answers you attract will be along the same lines. – Rawling Aug 20 '14 at 15:00
  • 2
    @Rawling Yes, I don't think a better solution exists either. But I was googling for this problem, and this question here seemed to be the only applicable result, so a definitive answer here should be useful to future visitors, even if it is a negative one. – HugoRune Aug 20 '14 at 15:04
  • It may be possible to use named blocks and class subtraction to get the same effect, or there may be a named block that already matches the required characters – Panagiotis Kanavos Aug 20 '14 at 15:13
  • Thanks for better version of question @HugoRune – PiotrB Aug 20 '14 at 15:21

2 Answers2

3

Custom character classes aren't supported in C# but you may be able to use named blocks and character class subtraction to get a similar effect.

.NET defines a large number of named blocks that correspond to Unicode character categories like math or Greek symbols. There may be a block that already matches your requirements.

Character class subtraction allows you to exclude the characters in one class or block from the characters in a broader class. The syntax is :

[ base_group -[ excluded_group ]]

The following example, copied from the linked documentation, matches all Unicode characters except whitespace, Greek characters, punctuation and newlines:

[\u0000-\uFFFF-[\s\p{P}\p{IsGreek}\x85]]
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
2

Nope, not supported in C#. This link will give you a nice overview of the .NET Regex engine. Note that nothing really stops you from defining variables and using them to construct your Regex string:

var digit = "[0-9]";
var regex = new Regex(digit + "[A-Z]");
Haney
  • 32,775
  • 8
  • 59
  • 68