0

For marshalling purpose and automatic type building with native API, I need to validate field names for some structure during runtime. The naming rules on native side are the same as in C# (no whitespace, no special characters like &, é, *, .).

Is there a standard regex pattern for that ?

NB: As workaround I'm thinking of building static method around DeclareProperty in TypeBuilder just for name validation purpose.

CitizenInsane
  • 4,755
  • 1
  • 25
  • 56
  • possible duplicate of [Regex to remove all special characters from string?](http://stackoverflow.com/questions/3303420/regex-to-remove-all-special-characters-from-string) – Shashank Shekhar Feb 26 '15 at 16:58
  • 2
    [A lot of Unicode characters (especially accented letters like `é`) are perfectly valid identifier names in C#](https://msdn.microsoft.com/en-us/library/aa664670%28v=vs.71%29.aspx). – CodeCaster Feb 26 '15 at 17:00
  • @CodeCaster Damned ... it's true ... never ever tried, but true ... so definitly won't let `Typebuilder.DeclareProperty` do the validation for me. – CitizenInsane Feb 26 '15 at 17:02
  • have you tried a simple google search here is a link for example that could help you in getting started in regards to using `RegEx` of course there are other built in string functions like `.Contains` in .net you can utilize as well http://stackoverflow.com/questions/12350801/check-string-for-invalid-characters-smartest-way – MethodMan Feb 26 '15 at 17:02
  • @MethodMan I know about regex, just wanted to know if there was a standard pattern to use for name validation instead of reinventing the wheel ... but ok ... I will reinvent. – CitizenInsane Feb 26 '15 at 17:06
  • @CitizenInsane I understand your point but what is valid for some may be considered as invalid for others so I would venture out to say the answer to your question is no.. sorry.. – MethodMan Feb 26 '15 at 17:09

2 Answers2

1

So do you want to remove the illegal characters (black list) or just check if the identifier is valid? For valid characters, you might use something like:

// Match an identifier - Matches "type1" but not &type1" or "#define".
\b(_\w+|[\w-[0-9_]]\w*)\b

Here is the MSDN reference (although it is for Visual Studio). Here is the Regex Patterns reference.

Ryan
  • 7,835
  • 2
  • 29
  • 36
1

Should be enough for what I want so far:

^[a-zA-Z]+[a-zA-Z0-9\_]*$
CitizenInsane
  • 4,755
  • 1
  • 25
  • 56
  • An identifier can also start with an `_`, as per https://learn.microsoft.com/en-us/dotnet/csharp/fundamentals/coding-style/identifier-names – Remi Despres-Smyth Jun 23 '21 at 18:38
  • @RemiDespres-Smyth You're right ... I removed `_` because targeted API was Matlab MxArray (Matlab does no allow names starting with `_`, nevertheless it's API doesn't make the test so you can create invalid names if not careful). – CitizenInsane Jun 24 '21 at 08:45