2

I have an ASP.Net Core 3.0 web application, where I have a form with several input fields. The input fields are bound to a model and they have some validation already. However, in one of the fields I want to restrict the user to enter URL addresses or even email addresses (but the URLs are more important at the moment).

My idea is the following: After the form is submitted on the server side to check the text in that field and if that text contains some URL, to remove it or invalidate it (add some spaces for example). My goal is since the users inputs will be later displayed in the web site, to restrict any URLs to be active or displayed at all, so if another user is checking that input, to not be tricked into clicking on some malicious web site links.

My question is: Do we already have a mechanism on .Net Core 3 (or previous version) that automatically checks for URLs in the user input and either removes them, invalidates them or gives a validation error? I was going to code the whole logic myself but if this is done already (in .Net Core or some other Open Source Library) it would be better and would save me some effort.

I also wonder if there are some custom validators or even basic .Net validators that are doing this. I am fine to have the validation on the server side only, but if by any chance we have a client-side validation for this, it would be even better.

SO far I don't have any specific code to show. I am interested in the general case so if it helps you, you can imagine a normal CRUD form (from those that are generated by VS).

Any help is appreciated.

Best Regards, Ahmed

== EDIT == Probably I was not clear enough. I am interested to see if a text, entered by a user contains one or several URLs in it or not. If there is any URL in that text to either remove it, somehow invalidate it or give a validation error. So if the user enters this text:

"Here you can find some crazy deals - http://crazydeals.com/notsocrazydeals and you can buy some high quality toys"

To be either turned to this:

"Here you can find some crazy deals - and you can buy some high quality toys"

or this

"Here you can find some crazy deals - h t t p : / / c r a z y d e a l s . c o m / n o t s o c r a z y d e a l s and you can buy some high quality toys"

Derrick
  • 2,502
  • 2
  • 24
  • 34
Xequtor
  • 125
  • 1
  • 12
  • 1
    Not sure if I was clear enough but I am interested to check if a text contains URLs (one or several) inside it, not whether a string is URL or not . Sorry for the inconvenience. I added more details to the question – Xequtor Nov 15 '19 at 15:14
  • 1
    https://regexr.com/4otrr ? – Luuk Nov 15 '19 at 15:42
  • 3
    I see that turning it into a non-clickable is an ok solution. Then you actually have nothing to do: a user entering an URL won't turn this URL magically into an anchor when rendered. You absolutely need to be sure that no HTML is entered in this field though to avoir cross-site scripting which would be much more problematic than URL's... – Laurent S. Nov 15 '19 at 15:54
  • 1
    .net validation should take care of entering html tags in form input fields, worth testing, but this is a default validator that need to be explicitly disabled to allow html entry in forms. See: https://learn.microsoft.com/en-us/aspnet/whitepapers/request-validation – Derrick Nov 15 '19 at 16:02
  • @LaurentS.: Yes, turning it into a non-clickable is OK. The thing is that when the user enters the URL and submits the form, I will store the input in the Database. And of course I will encode it in order to avoid recording some HTML content. The ASP.Net application is also validating the user input and does not allow for entereing HTML content, so I hope that I have solved this issue. However, other users can see what has been entered by the malicious user so they can see a clickable URL - exactly as I did with my question above. This is what I want to avoid. – Xequtor Nov 15 '19 at 19:05
  • 1
    @Derrick: Yep, you are right. The default validator is working fine for me. I don't have problems with entering html contets (at least so far). Plus, I am encoding the input. – Xequtor Nov 15 '19 at 19:09

2 Answers2

2

Regex is the best way to solve this, perhaps using "https?:.*(?=\s)" This code will remove all url's from a string:

Regex regx = new Regex("https?:.*(?=\s)", RegexOptions.IgnoreCase);

MatchCollection matches = regx.Matches(txt);

foreach (Match match in matches) {
    txt = txt.Replace(match.Value, "");

You can also use a RegularExpressionAttribute to invalidate a model input based on a pattern. Such an attribute will invalidate on both client side and server side.

public class TestModel
{
    [RegularExpression(@"^((?!(https?:.*(?=\s))).)*$", ErrorMessage = "URL's are not allowed.")]
    public string Text { get; set; }
}

Here's a Test of the RegularExpressionAttribute:

[TestMethod]
public void TestNotUrl()
{
    var modelFail = new TestModel { Text = "Here you can find some crazy deals - http://crazydeals.com/notsocrazydeals and you can buy some high quality toys" };
    var modelPass = new TestModel { Text = "Here you can find some crazy deals - crazydeals.com and you can buy some high quality toys" };

    var result = new List<ValidationResult>();
    var context = new ValidationContext(modelFail) { MemberName = "Text" };
    var expectNotValid = System.ComponentModel.DataAnnotations.Validator.TryValidateProperty(modelFail.Text, context, result);
    var expectValid = System.ComponentModel.DataAnnotations.Validator.TryValidateProperty(modelPass.Text, context, result);

    Assert.IsFalse(expectNotValid, "Expected modelFail.Text not to validate, as it contains a URL.");
    Assert.IsTrue(expectValid, "Expected modelPass.Text to validate, as it does not contain a URL.");
}
Derrick
  • 2,502
  • 2
  • 24
  • 34
  • 1
    I think I will go with that approach and have the function you mention check for the links, then either remove them or invalidate them. I was kinda hoping that maybe there is something, that does this automatically, but nevertheless I think your idea is what I need to do. The RegularExpressionAttribute idea seems like the thing I need, however it doesn't seem to work on my side. I just wonder would that attribute validate whether the whole text is URL or would it look for URLs inside the text. I am guessing the first option but I might be wrong – Xequtor Nov 15 '19 at 19:22
  • It looks for any occurrence of a URL within the text. I'll add a test method to my answer. – Derrick Nov 16 '19 at 00:31
  • 1
    I did it with the Regex approach. It works well for me. Thanks for the support – Xequtor Nov 25 '19 at 05:27
  • That will remove only _web_ URIs. File, FTP, mailto, LDAP, and other schemes will also have to be added - any prefix that ends with a colon, basically. It's actually a little more complicated, according to [RFC3986](https://tools.ietf.org/html/rfc3986) that actually defines the standard. – Suncat2000 Apr 14 '21 at 19:05
  • Great answer. Sadly, if I typed HTTP: as uppercase I was able to bypass this regex validation. Perhaps /i needed to be in place somewhere or by replacing https with [Hh][Tt][Tt][Pp][Ss] - seemed to work. – Dmitri K Mar 13 '22 at 18:38
0

You can create your own validator and validate as follows:

Uri uriResult;
bool result = Uri.TryCreate(uriName, UriKind.Absolute, out uriResult) 
&& (uriResult.Scheme == Uri.UriSchemeHttp || uriResult.Scheme == Uri.UriSchemeHttps);

Reference:

How to check whether a string is a valid HTTP URL?

Azhar Khorasany
  • 2,712
  • 16
  • 20
  • Not sure if I was clear enough but I am interested to check if a text contains URLs (one or several) inside it, not whether a string is URL or not . Sorry for the inconvenience. I added more details to the question. – Xequtor Nov 15 '19 at 15:15
  • Ok. The code above tells you how to identify a URL. You can use the same code to check multiple areas of your string. May be check for the word "http" in the string. If found split by space (as URL don't have space), apply the above logic to identify whether it is a valid URL. Then continue on in the string. – Azhar Khorasany Nov 18 '19 at 10:20