5

I have the following regular expression validator to detect whether an input string contains HTML/script tags and if so cause a vaidation error:

<asp:TextBox ID="txt" runat="server" />
    <asp:RegularExpressionValidator 
        ControlToValidate="txt" 
        runat="server"
        ID="regexVal"
        EnableClientScript="true"  Display="Dynamic"
        ErrorMessage="Invalid Content" 
        Text="!" 
        ValidationExpression=">(?:(?<t>[^<]*))" />

When I run the page hosting this markup I get a scipt error with the message "Syntax Error in Regular Expression". However when I take the same regex and run it using Regex class from System.Text.RegularExpressions everything works fine: Like so:

Regex r = new Regex(">(?:(?<t>[^<]*))");
r.IsMatch(@"<b>This should cause a validation error</b>");
r.IsMatch("this is fine");

What am I missing

UPDATE: The error seems to be happening in the following js function in WebResource.axd:

function RegularExpressionValidatorEvaluateIsValid(val) {
    var value = ValidatorGetValue(val.controltovalidate);
    if (ValidatorTrim(value).length == 0)
        return true;
    var rx = new RegExp(val.validationexpression); //this is the line causing the error
    var matches = rx.exec(value);
    return (matches != null && value == matches[0]);
}
Abhijeet Patel
  • 6,562
  • 8
  • 50
  • 93
  • I do not have problem running your code, which browser are you using? – o.k.w Dec 01 '09 at 07:04
  • FF 3.5 and IE 8. The script error is thrown when the default browser is set to IE and the project is run in debug mode. – Abhijeet Patel Dec 01 '09 at 07:26
  • 3
    I think it is because at the client-side Regex will be implemented in JavaScript, thus the Regex should comply with the JavaScript Regex Flavour. JavaScript does not support named-captures so the Regex should be simplified to >(?:([^<]*)) – Huppie Dec 01 '09 at 07:55
  • @Huppie: That's true. However `>(?:([^<]*))` doesn't seem to work also. Have you tested it? – o.k.w Dec 01 '09 at 08:02
  • EnableClientScript="true" is the default. – Marcel Nov 15 '17 at 12:35

5 Answers5

10

I think the problem is that JavaScript does not understand .NET's regular expression syntax for grouping.

When you set EnableClientScript to true on the RegularExpressionValidator ASP.NET re-creates your regular expression in JavaScript to enable client-side validation on you controls. In this case, JavaScript doesn't support the syntax for named groups (?<t>...) and non-capturing groups (?:...). While these features work in .NET JavaScript is struggling with them.

From RegularExpressionValidator Control (General Reference) on MSDN :

On the client, JScript regular expression syntax is used. On the server, Regex syntax is used. Because JScript regular expression syntax is a subset of Regex syntax, it is recommended that you use JScript regular expression syntax in order to yield the same results on both the client and the server.

There are two ways you can correct this:

  1. Disable the client-side script generation and have the regular expression execue on the server-side. You can do this by setting EnableClientScript to false.
  2. Modify the regular expression and remove the non-capturing groups and named groups. If you need capturing in your regular expression, the (...) syntax should work correctly in both JavaScript and .NET. You would then use ordinal number references to access captured values ($1, $2, etc.). Something like >[^<]* should work as intended. See Grouping Constructs on MSDN.

I'd like to point out a couple of other issues:

  • You original regular expression doesn't seem to need capturing at all if all you want to do is check for the existence of an opening angle bracket. It could be rewritten as >[^<]* which will be simpler and work exactly the same way. It won't capture any values in the original string, but since you're using it in an ASP.NET validation control this shouldn't matter.
  • The way you're implementing the RegularExpressionValidator will only work if the match is successful. In your case, your validation will pass if your textbox contains something like >blah. I think you want it to work the other way around.
  • If you modify the regular expression to >[^<]*, the regular expression will still not work how I think you intend it to. The validation control tries to match all text in the textbox. So if I enter >blah in the textbox, it will match, but <b>blah</b> won't because the regular expression says that the string must start with a >. I would suggest trying something like .*>.*[^<]* to allow text before the >.
dariom
  • 4,413
  • 28
  • 42
  • Thanks for the clarification. That makes a lot of sense now. I'd still like to get an equivalent regex that achieves the same end result as the original regex i.e detect html tags/content so that I can flag it as a validation error. Any ideas? – Abhijeet Patel Dec 02 '09 at 04:03
  • 1
    `[^<>]*` might be a starting point for your `RegularExpressionValidator`. It will try and match strings containing anything except angle brackets. Please note that parsing HTML with regular expressions is generally a bad idea: http://stackoverflow.com/questions/1816255/when-is-it-wise-to-use-regular-expressions-with-html (great links in that question - good to follow them!). In this instance, detecting HTML in input might be OK though... – dariom Dec 02 '09 at 07:12
  • Wouldn't this also detect less than and greater than symbols in general as matches such as "price must be >40 and <100"? In the use case I'm dealing with such inputs are considered valid. Only inputs such as "" or "I'm bold" and the like are considered invalid. – Abhijeet Patel Dec 02 '09 at 07:32
  • How about something like this? ^.*<\w+>.*$ – Abhijeet Patel Dec 02 '09 at 07:55
  • Yes, my suggested regular expression would block "price < 10", etc. I did say it was just a start :-) What you're aiming for is possible, but tricky because of the way `RegularExpressionValidator` works (you have to invert the logic of the regular expression - i.e. you specify which patterns are allowed - not disallowed). Your expression `^.*<\w+>.*$` wouldn't prevent something like "" being entered. This question deals with the error in your `RegularExpressionValidator` control. I'd recommend a new question to find an appropriate regular expression pattern to do what you want – dariom Dec 02 '09 at 11:36
  • Fair enough. Your answer is the best one. Thanks for all your help – Abhijeet Patel Dec 03 '09 at 04:21
1

I managed to find the root cause but not sure what exactly can be the resolution.

Using Firebug Console in FF3.5, run this to trigger all the client-side validator:

for(var _v=0; _v<Page_Validators.length; _v++){
    ValidatorValidate(Page_Validators[_v]);
}

then enter some text into the txt textbox and run the script again, an exception is thrown:
"invalid quantifier ?[^<]*))"

Somehow the regex string can't be parsed by the browser's regex engine. I haven't been able to find the alternative regex for it.

o.k.w
  • 25,490
  • 6
  • 66
  • 63
  • I'm wondering whether this is an ASP.NET bug in emitting out the javascript for the regex, as I mentioned in the Update, the following line is bombing: var rx = new RegExp(val.validationexpression); – Abhijeet Patel Dec 01 '09 at 07:47
  • I suggest changing to alternative regex or use only server-side validation. I have had regex compatibility issues with some browsers as well. – o.k.w Dec 01 '09 at 07:59
1

This did the trick for me:

(^[^<>]*$)|(^[^>]*$)|(^[^<]*$)

I wanted to allow the user to be able to use one < or > but not . (This does fail on >anything< but I can live with that)

Mister Cook
  • 1,552
  • 1
  • 13
  • 26
0

Thanks to dariom...this seems to be concise and works... [RegularExpression(@"[^<>]*", ErrorMessage = "No SCRIPT tags please.")]

Bryant
  • 99
  • 3
  • 12
0

you should try this Regex r = new Regex(@">(?:(?[^<]*))");

ade
  • 1