10

Requirements for a TextBox control were to accept the following as valid inputs:

  1. A sequence of numbers.
  2. Literal string 'Number of rooms'.
  3. No value at all (left blank). Not specifying a value at all should allow for the RegularExpressionValidator to pass.

Following RegEx yielded the desired results (successfully validated the 3 types of inputs):

"Number of rooms|[0-9]*"

However, I couldn't come up with an explanation when a colleague asked why the following fails to validate when the string 'Number of rooms' is specified (requirement #2):

"[0-9]*|Number of rooms"

An explanation as to why the ordering of alternation matters in this case would be very insightful indeed.

UPDATE:

The second regex successfully matches the target string "Number of rooms" in console app as shown here. However, using the identical expression in aspx markup doesn't match when the input is "Number of rooms". Here's the relevant aspx markup:

<asp:TextBox runat="server" ID="textbox1" >
</asp:TextBox>

<asp:RegularExpressionValidator ID="RegularExpressionValidator1" 
EnableClientScript="false" runat="server" ControlToValidate="textbox1" 
ValidationExpression="[0-9]*|Number of rooms" 
ErrorMessage="RegularExpressionValidator"></asp:RegularExpressionValidator>

<asp:Button ID="Button1" runat="server" Text="Button" />
Abhinav
  • 195
  • 10
  • For `A sequence of numbers.` you should use `\d+` or `[0-9]+`, not `[0-9]*` as that means **any** number of digits (including **none**). – Oded Apr 20 '12 at 15:08
  • 1
    In one case you have "Number of rooms", and in the other "Number of rows". Is that a typo? – Paolo Tedesco Apr 20 '12 at 15:09
  • @Oded: But then it wouldn't match the empty string as per item 3. – Martin Liversage Apr 20 '12 at 15:10
  • @MartinLiversage - It would, as an empty string is no digits at all. – Oded Apr 20 '12 at 15:11
  • Don't you need a `^` and a `$` to make this a meaningful regex? The `[0-9]*` will match any string, otherwise, right? (or does the validator force that the entire string matches?) – agent-j Apr 20 '12 at 15:13
  • @Abhinav: pity, I thought I had found a brilliant answer ;) – Paolo Tedesco Apr 20 '12 at 15:14
  • @Oded: What I'm saying is that the empty string `""` does **not** match the regular expression `\d+`. – Martin Liversage Apr 20 '12 at 15:21
  • @MartinLiversage - True enough. I am just talking about the semantics of `*` vs `+`. – Oded Apr 20 '12 at 15:23
  • I am experiencing the same issue. Trying to match any positive 1 or 2 digit number entered up to and including 50: "50|[0-4]?\d" works, while "[0-4]?\d|50" doesn't (the match fails when entering "50"). I've also tried "([0-4]?\d)|50" and "([0-4]?\d)|(50)" which didn't make any difference. This is happening in ASP.NET 4.0. I can't figure out the reason. – kad81 Nov 13 '12 at 05:04
  • Making the link to recent related questions: some regex engines will match the longer string rather than the first option in an alternation. See [Use the right regex flavor!](https://stackoverflow.com/a/36296918/3216427) and [Regular expressions as finite state automata](https://stackoverflow.com/a/57738489/3216427). – joanis Sep 04 '19 at 14:41

3 Answers3

10

The order matters since that is the order which the Regex engine will try to match.

Case 1: Number of rooms|[0-9]*

In this case the regex engine will first try to match the text "Number of room". If this fails will then try to match numbers or nothing.

Case 2: [0-9]*|Number of rooms:

In this case the engine will first try to match number or nothing. But nothing will always match. In this case it never needs to try "Number of rooms"

This is kind of like the || operator in C#. Once the left side matches the right side is ignored.

Update: To answer your second question. It behaves differently with the RegularExpressionValidator because that is doing more than just checking for a match.

// .....
Match m = Regex.Match(controlValue, ValidationExpression);
return(m.Success && m.Index == 0 && m.Length == controlValue.Length); 
// .....

It is checking for a match as well as making sure the length of the match is the whole string. This rules out partial or empty matches.

Matthew Manela
  • 16,572
  • 3
  • 64
  • 66
  • Thanks. But if nothing will always match and 'it never needs to try "Number of rooms"', then why it fails to match when the string 'Number of rooms' is specified? – Abhinav Apr 20 '12 at 15:21
  • Since empty string will match the beginning of 'Number of rooms'. A regex with * can always match 0 things. – Matthew Manela Apr 20 '12 at 15:33
  • if the beginning of the string "Number of rooms" proves to be a 'match' for an empty string, why the RegEx fails? – Abhinav Apr 20 '12 at 15:57
  • It doesn't fail. It just matches the empty string. If you run this [code](http://pastebin.com/yzr6Bqcd) it will show that it does succeed. – Matthew Manela Apr 20 '12 at 16:08
  • You're right. It doesn't fail in console app but strangely enough the same regex, when specified in aspx markup, fails to match the input "Number of rooms". Please refer to the updated question. – Abhinav Apr 20 '12 at 17:01
  • Appreciate your prompt response and the awesome replies! Guess it was never about ordering of alternatives in regex for this particular case but more about how the RegularExpressionValidator validates. – Abhinav Apr 20 '12 at 18:38
  • In general, I tend to avoid using patterns that can match the empty string anywhere in an alternation; instead, I use non-empty patterns and then make the entire group optional: /[0-9]+|Number of rooms|/ or /([0-9]+|Number of rooms)?/ – Thomas S. Trias Feb 06 '15 at 16:13
3

The point is that the [0-9]* at the beginning is matching empty strings if you specify that first.
If you specify that the whole string should be digits, then it should work:

^[0-9]*$|Number of rooms

Unless you specify ^ and $, to indicate that the whole string must be a match, an empty string will be matched at the beginning of "Number of rooms", and at that point the second alternative will not be tried out.
I hope this answers your question in the comment, I'm not sure if it's clear...

Paolo Tedesco
  • 55,237
  • 33
  • 144
  • 193
  • Thanks. That works! However, even though it tries to match empty strings which with the input string 'Number of rooms' isn't the case, why it doesn't try the other alternative? – Abhinav Apr 20 '12 at 15:27
  • It seems to be an asp.net specific issue that seems to have troubled my understanding of regular expressions. Checkout of the updated question – Abhinav Apr 20 '12 at 17:13
3

You probably wanted to use regex Number of rooms|[0-9]+ or [0-9]+|Number of rooms, because pattern [0-9]* (with star) will always match at least empty string (* means {0,}, so "zero or more...").

Ωmega
  • 42,614
  • 34
  • 134
  • 203