4

Is there a way to determine that a single char is valid when a regular expression expects a specific number of that char?

I have a WPF custom keyboard and would like to adjust each key's availability based on a regular expression. This will work well when the expression is fairly simple and does not expect specific order of the chars or a specific length to satisfy the pattern.

However, when the pattern becomes more complex and specific, testing a single char against it will always fail.

For instance, given the regular expression [a-zA-Z0-9]{4}

These values will succeed:

  • ABCD
  • abcd
  • 1234
  • A23e

The expression clearly expects alphanumerical chars only. I would like a method that given the expression will reject special char, say "%", but accept "a" as "a" is acceptable in [a-zA-Z0-9]. The only issue is the specific length that will not be satisfied.

I am currently using Regex.IsMatch. I guess I am looking for a partial match testing method.

Louis
  • 705
  • 10
  • 18
  • 3
    I have read your question twice and I still don't get what you want to do. Maybe just replace everything that's not alphanumeric `[^a-zA-Z0-9]+`? – HamZa Feb 11 '15 at 22:20
  • @HamZa I read it as "is there any chance that substring I have will eventually (when added more characters) satisfy given regex". – Alexei Levenkov Feb 11 '15 at 22:22
  • Yes, I am looking for a partial match. If expression expects "abc", I want a method that tells me that with "a" I am on the right track, but not with "1" – Louis Feb 11 '15 at 22:24
  • @AlexeiLevenkov The answer would be obviously "no" unless you make the regex optional. Which defeats the whole point of using it in the first place. – HamZa Feb 11 '15 at 22:24
  • Louis - are there any restrictions on regular expression (like "always in a form of ....") or you want that to check for partial patch against generic regex? – Alexei Levenkov Feb 11 '15 at 22:24
  • @AlexeiLevenkov, I would like to keep generic if possible as I didn't write the regular expressions myself. – Louis Feb 11 '15 at 22:26
  • @Louis You're overcomplicating things a lot. Just tell the user which characters are allowed and that's it. Especially if you aren't proficient in regex. – HamZa Feb 11 '15 at 22:26
  • @HamZa Why "no"? Trivial approach like have string that matches and try replacing all sub-strings with one to test and check if it is still matching sounds fair to me... "abcd" and new char "%" - none of "%bcd", "a%cd", "ab%d", "abc%" match - so typing any other characters next to "%" unlikely to produce match. – Alexei Levenkov Feb 11 '15 at 22:28
  • @HamZa I realise this is a rather complex case where a generic approach may not be feasible. I would have thought that Regex would allow such a test however, as it can break down matching portions of a given string. This is one step further. – Louis Feb 11 '15 at 22:29
  • @HamZa - totally agree on "over-complicating" statement - list of characters would be so much simpler to understand/code. – Alexei Levenkov Feb 11 '15 at 22:30
  • @AlexeiLevenkov See the quantifier `{4}`, it will fail in the cases you provided. But I get the gist. I forgot that this regex isn't anchored. Basically expecting some magic code to work for any generic regex is a bit optimistic IMHO. – HamZa Feb 11 '15 at 22:30
  • 2
    @HamZa but "ab%d" wil fail due to wrong char, not length - but getting generic enough filler string probably is way too complex (completely correct one)... Anyway I don't believe there is any sane way to get partial match for generic regex - parsing/reconstructing may be something to look into for entertainment purposes (but not really useful for real code) – Alexei Levenkov Feb 11 '15 at 22:33
  • thanks all for your inputs. BTW, I am already providing feedback to user when their input is invalid. I just did not foresee this limitation when I started to dynamically adjust the keys. I will revisit the design with that in mind now. – Louis Feb 11 '15 at 22:39
  • @Louis For the sample example you provided, you could add another regex (check). The regex could look like `[^a-zA-Z0-9]`, this will match any non-alphanumeric character. I'm guessing that .NET could return the offset. With the offset and captured invalid character, you could probably work something out. Now you will need to find the "inversion" of all your regexes and apply such logic... – HamZa Feb 11 '15 at 22:44
  • 1
    @Alexei there is a sane way actually - see my answer ;) – Lucas Trzesniewski Feb 11 '15 at 22:56

1 Answers1

5

Sure, you can, but not using the built-in regex engine unfortunately. You can use PCRE instead, which provides the partial matching feature you're asking for.

From the PCRE docs:

In normal use of PCRE, if the subject string that is passed to a matching function matches as far as it goes, but is too short to match the entire pattern, PCRE_ERROR_NOMATCH is returned. There are circumstances where it might be helpful to distinguish this case from other cases in which there is no match.

Consider, for example, an application where a human is required to type in data for a field with specific formatting requirements. An example might be a date in the form ddmmmyy, defined by this pattern:

 ^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$

If the application sees the user's keystrokes one by one, and can check that what has been typed so far is potentially valid, it is able to raise an error as soon as a mistake is made, by beeping and not reflecting the character that has been typed, for example. This immediate feedback is likely to be a better user interface than a check that is delayed until the entire string has been entered. Partial matching can also be useful when the subject string is very long and is not all available at once.

PCRE supports partial matching by means of the PCRE_PARTIAL_SOFT and PCRE_PARTIAL_HARD options, which can be set when calling any of the matching functions. For backwards compatibility, PCRE_PARTIAL is a synonym for PCRE_PARTIAL_SOFT. The essential difference between the two options is whether or not a partial match is preferred to an alternative complete match, though the details differ between the two types of matching function. If both options are set, PCRE_PARTIAL_HARD takes precedence.


But PCRE is a C library... So I've built a PCRE wrapper for .NET.

Usage example from the readme:

var regex = new PcreRegex(@"(?<=abc)123");
var match = regex.Match("xyzabc12", PcreMatchOptions.PartialSoft);
// result: match.IsPartialMatch == true

A little caution though: the wrapper is currently at v0.3, using PCRE v8.36 but PCRE v10.0 was released recently (with a new API), so expect some breaking changes in the API of v0.4 of PCRE.NET. The behavior should stay the same though.

And also, you should be aware of the differences between .NET and PCRE regex flavors. This should not be a problem for most cases though.

Community
  • 1
  • 1
Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
  • Thanks for your detailed answer Lucas. I will look into it later. For now, since I have a fairly small list of regular expressions to support, I will probably create a physical file that will map which keys are enabled based on each expression. This should be dynamic enough for the time being.... – Louis Feb 12 '15 at 14:12