Determine if a Regex object will only accept upper case chars

Question

In the system I am working on, regular expressions are used to enforce some specific input format for WPF Textboxes.

A behavior gets assigned a Regex object and controls the chars being typed and only let the ones valid go through. (solution similar to this article )

There is one exception however. When only upper case chars will be accepted, the chars being typed should be automatically converted to upper case instead of being rejected.

My question is:

How to elegantly determine that the regular expression, supplied in a Regex object, will only accept upper case? Is the only option to test a lower case string and then a upper case string against it? example:

if (Regex.IsMatch("THIS SHOULD PASS") && !Regex.IsMatch("this should fail")
{
    // logic to convert lower case to upper case.
}

You can do `if ((Regex.Options & RegexOptions.IgnoreCase) == RegexOptions.IgnoreCase)` to see if the case-**in**sensitivity Option has been set. But that doesn't tell you anything about whether the pattern itself accepts lower case letters. There's this: `protected internal string pattern` but that _["is not intended to be used directly from your code"](http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.pattern(v=vs.110).aspx)_. Otherwise you could examine that to see if the pattern to be used matches `[a-z]`? (not a C# developer, protected might mean you can't) — asontu, Jan 20 '15 at 15:50
Are the regular expressions user-supplied, or could you feasibly supply a "uppercase everything" flag alongside each? Because otherwise, you're going to have to write your own regex parser, and believe me, that's not fun. — Rawling, Jan 20 '15 at 16:08
@Rawling. Some regular expressions are currently stored as constant in my code, so I could always compare the pattern against the one for upper case. It however makes the code less re-usable outside of current code base. I can also pass an extra boolean around and define a new dependency property on my validation behavior to simplify things. This implies greater code modification however. — Louis, Jan 20 '15 at 16:17
@funkwurm. Regex.ToString() will return the pattern itself. I suppose inspecting the pattern is the best option for now. — Louis, Jan 20 '15 at 16:21
@Louis If you're writing the regular expressions, why do you need to test to see if it only accepts upper case? Also, if you are supposed to allow a lower case letter be typed, then convert it, that would mean you must accept, then change the lower case letter to upper case before you actually test it against a regex. — Kcvin, Jan 20 '15 at 16:24
This is not a trivial task. For example: `(?=[ABC123])[A-Za-z]`. What algorithm is smart enough to recognize that this can only match `[ABC]`? In case you're wondering, the answer is "a very complicated one". — Kendall Frey, Jan 20 '15 at 16:40
The solution you've already identified -- to probe the regex by providing upper and lower case strings known to pass/fail -- seems very reliable, assuming you know enough about the regex to be able to provide appropriate probe strings. Do you? Know enough about the regex, that is? If not, then I think the best solution is to encapsulate a flag with the regex, requiring the author of the regex to set the flag, and just depend on that. Making this determination on arbitrary regexs seems completely impractical to me (probably _impossible_, i.e. NP-complete). — Peter Duniho, Jan 20 '15 at 18:20
@Peter Duniho. Yes, as mentioned in a previous comment, I do have access to a few recurring regular expression pattern constants in my code. The upper case one is typically "^[A-Z\-]+$". I can take the YAGNI principle and assume the same pattern for upper case will always be referred to which will greatly simplify this problem.Thanks all for your comments. — Louis, Jan 20 '15 at 18:39
Is it very expensive to simply call `ToUpper()` on the input, either after the user submits it or while they are typing? All non-lowercase letters are simply ignored, and already-uppercased letters are skipped as well. After the `ToUpper()` runs, your regex can validate the rest of the input characters are valid. — OnlineCop, Feb 10 '15 at 16:05

score 1 · Answer 1 · answered Mar 15 '15 at 22:57

1

I got bored and took a shot at this.

Here's a sad but elegant implementation.

This doesn't catch unicode or hexadecimal escapes.

There's probably some other bugs. Make unit tests.

Feel free to extend it.

answered Mar 15 '15 at 22:57

TylerY86

3,737
16
29

I'm assuming you want to do this to preserve some internal state of the Regex object. Understandable. It's probably better to clone the Regex and just test the clone. You can use these methods to point to a location in a Regex string and say "Look there, there's where you're accepting uppercase/lowercase." – TylerY86 Mar 15 '15 at 22:59
Er, when I said "test the clone", I didn't mean with the sad but elegant implementation, I mean actually test it against some example text... which reminds me, you can't always create matching example text for partially accepting expressions, so... I suppose there's your use case. In case it's not immediately obvious, because these are not exclusive tests but inclusive, to see if an expression will only take uppercase, you need to check if it accepts uppercase and does not accept lowercase to get an exclusive result (as the OP expressed)... – TylerY86 Mar 15 '15 at 23:12

score 0 · Answer 2 · answered Jul 20 '16 at 19:29

How to elegantly determine that the regular expression will only accept upper case?

If the pattern detection scheme detects this pattern [a-z] that would be a sign that it is to use this replace:

Regex.Replace("OmegaMan", "[a-z]", (mt) =>
 {
    return mt.Groups[0].Value.ToUpper();
 }

Output

OMEGAMAN

This operation takes any lower case letter typed and replaces it with an uppercase letter; while ignoring any upper case ones and returning the whole text back.

Determine if a Regex object will only accept upper case chars

2 Answers2