IS String is valid URL OR NOT

Question

I'm using .net 2010 c# windows application with this code: to check Valid Uri or not

Code:

static bool IsValidUrl(string urlString)
{
    Uri uri;
    return Uri.TryCreate(urlString, UriKind.Absolute, out uri)
        && (uri.Scheme == Uri.UriSchemeHttp
         || uri.Scheme == Uri.UriSchemeHttps
         || uri.Scheme == Uri.UriSchemeFtp
         || uri.Scheme == Uri.UriSchemeMailto
         );
}

Problem: if i validate this http://http://www.Google.com i'm getting its valid but when i trying to use IE it not showing any site.

Is there any way to find out String is valid uri or not? (with out using regular expressions and internet access)

well, what's the point of not checking vs the web? if you don't have internet, he won't be able to access it anyways ... — Noctis, Nov 15 '13 at 09:17
Whether a URL is of valid format and whether it actually points anywhere are different things. — Grant Thomas, Nov 15 '13 at 09:19
It is. When you take a look at the rfc3986, you will see that it is possible. Your string will result into the following: `Scheme: http, Protocol: http, Resource: //www.Google.com` which is a invalid url (for webrequests), but a valid uri. — jAC, Nov 15 '13 at 09:32
Thanks for Explanation @JanesAbouChleih. May i Know is there any way to validate URL without using internet access. — Civa, Nov 15 '13 at 09:36
@JanesAbouChleih you mean `Host: http` which is completely valid, and could even work, if you were on a lan with a machine called `http` or your hosts file called something `http`. — Jon Hanna, Nov 15 '13 at 11:01
possible duplicate of [How to check whether a string is a valid HTTP URL?](http://stackoverflow.com/questions/7578857/how-to-check-whether-a-string-is-a-valid-http-url) — Nerdroid, Dec 18 '14 at 01:21

Jon Hanna · Answer 1 · 2013-11-15T10:55:49.550

It's not an invalid URI or even a URI that won't ever work: You could use it in a browser somewhere where there was a local machine called "http" (or if you had your Hosts file set to call a machine that).

The problem is that the perfectly correct URI http://http://www.Google.com, which would normlly be used in the form http://http//www.Google.com, because we normlly don't include the : after the host unless we're including the port number, won't work because it fails to find a machine called "http".

Now, even if that would work sometimes, it of course wouldn't work all the time. So it's a different problem to that of the URI http://www.thisdoesnotexistbecauseijustmdeitup.com/.

If you need to also detect that case, then there really is no way other than connecting to the Internet.

If you need to detect URIs that will work globally, rather than just on particular LANs then:

static bool IsGloballyUsableWebMailorFtpUrl(string urlString)
{
  Uri uri;
  if(!Uri.TryCreate(urlString, UriKind.Absolute, out uri))
    return false;
  if(uri.Scheme != Uri.UriSchemeHttp
     && uri.Scheme != Uri.UriSchemeHttps
     && uri.Scheme != Uri.UriSchemeFtp
     && uri.Scheme != Uri.UriSchemeMailto)
     return false;
  string host = uri.Host;
  IPAddress ip;
  if(!IPAddress.TryParse(host, out ip))//if we don't have an IP address in the host part.
    return host.Contains('.') && !host.EndsWith(".local", StringComparison.OrdinalIgnoreCase); // Does the domain have at least one period
                                                   // And not the "local" binding used on many
                                                   // Private networks
  var octets = ip.GetAddressBytes();
  if(octets.Length == 4)
    switch(octets[0])//We've an IPv4 IP address, check it's not reserved.
    {
      case 0: case 10: case 127:
        return false;
      case 128: case 191:
        return octets[1] != 0;
      case 169:
        return octets[1] != 254;
      case 172:
        return octets[1] < 16 || octets[1] > 31;
      case 192:
        return octets[1] != 168 && (octets[1] != 0 || octets[2] != 0);
      case 223:
        return octets[1] != 255 && octets[2] != 255;
      default:
        return true;
    }
  else
    {  //We've an IPv6 IP address, check it's not reserved.
      if(IPAddress.HostToNetworkOrder(1) != 1)
        octets = octets.Reverse().ToArray();
      var ipInt = new BigInteger(octets);
      //Not the neatest approach, but serves
      if(ipInt < 0)
        return true;
      if(ipInt < 2)
        return false;
      if(ipInt < 281470681743360)
        return true;
      if(ipInt < 281474976710656)
        return false;
      if(ipInt < BigInteger.Parse("524413980667603649783483181312245760"))
        return true;
      if(ipInt < BigInteger.Parse("524413980667603649783483185607213056"))
        return false;
      if(ipInt < BigInteger.Parse("42540488161975842760550356425300246528"))
        return true;
      if(ipInt < BigInteger.Parse("42540488241204005274814694018844196864"))
        return false;
      if(ipInt < BigInteger.Parse("42540489429626442988779757922003451904"))
        return true;
      if(ipInt < BigInteger.Parse("42540490697277043217009159418706657280"))
        return false;
      if(ipInt < BigInteger.Parse("42540766411282592856903984951653826560"))
        return true;
      if(ipInt < BigInteger.Parse("42540766490510755371168322545197776896"))
        return false;
      if(ipInt < BigInteger.Parse("42545680458834377588178886921629466624"))
        return true;
      if(ipInt < BigInteger.Parse("42550872755692912415807417417958686720"))
        return false;
      if(ipInt < BigInteger.Parse("334965454937798799971759379190646833152"))
        return true;
      if(ipInt < BigInteger.Parse("337623910929368631717566993311207522304"))
        return false;
      if(ipInt < BigInteger.Parse("338288524927261089654018896841347694592"))
        return true;
      if(ipInt < BigInteger.Parse("338620831926207318622244848606417780736"))
        return false;
      if(ipInt < BigInteger.Parse("338953138925153547590470800371487866880"))
        return true;
      if(ipInt < BigInteger.Parse("340282366920938463463374607431768211456"))
        return false;
      return true;
    }
}

Edit: It's worth considering whether you should do this check at all, if it's for an application that will eventually connect to the URI in question, you're just going to annoy users by refusing to connect to machines on their lan.

Thanks For Interesting Answer and Explanation. even its not meet my requirement. — Civa, Nov 15 '13 at 11:16
@Civa what further requirements do you have? It correctly blocks `http://http://www.Google.com` and cases like it (`http://blah/`, `http://192.168.0.0`), and lets through just about any URI for any real website (`http://www.google.com`, `http://193.120.166.84` etc.) and doesn't hit the network to do so. What other possibilities do you need to allow or disallow beyond that? — Jon Hanna, Nov 15 '13 at 11:54
I working with old library digitization process. they are not give me guaranty that web page is alive right now. so i can't get ip address of such locations. so your solution is not suitable for me. But its interesting approach that's why i given +1 earlier — Civa, Nov 15 '13 at 12:12
@Civa I only pay attention to IP addresses in the case where the URI entered contains it directly, otherwise that's not a factor. — Jon Hanna, Nov 15 '13 at 12:13

varocarbas · Answer 2 · 2015-04-17T07:30:45.090

2

The best way to know whether a given string represents a valid url, without actually testing it and by bearing in mind the comments above (something which might fit within the given schema, but is not what you consider right), is performing a custom analysis. Also, you should replace your bool function with a string (or an Uri) one able to correct certain situations (like the example you propose). Sample code:

private void Form1_Load(object sender, EventArgs e)
{
    string rightUrl = returnValidUrl("http://http://www.Google.com");
    if (rightUrl != "")
    {
        //It is OK
    }
}

static string returnValidUrl(string urlString)
{
    string outUrl = "";
    Uri curUri = IsValidUrl(urlString);
    if (curUri != null)
    {
        string headingBit = "http://";
        if (curUri.Scheme == Uri.UriSchemeHttps) headingBit = "https://";
        if (curUri.Scheme == Uri.UriSchemeFtp) headingBit = "ftp://";
        if (curUri.Scheme == Uri.UriSchemeMailto) headingBit = "mailto:";

        outUrl = headingBit + urlString.ToLower().Substring(urlString.ToLower().LastIndexOf(headingBit) + headingBit.Length);
    }

    return outUrl;
}

static Uri IsValidUrl(string urlString)
{
    Uri uri = null;
    bool isValid = Uri.TryCreate(urlString, UriKind.Absolute, out uri)
        && (uri.Scheme == Uri.UriSchemeHttp
         || uri.Scheme == Uri.UriSchemeHttps
         || uri.Scheme == Uri.UriSchemeFtp
         || uri.Scheme == Uri.UriSchemeMailto
         );

    if (!isValid) uri = null;

    return uri;
}

What can be called with:

string rightUrl = returnValidUrl("http://http://www.Google.com");
if (rightUrl != "")
{
    //It is OK
}

You would have to extend this method to recognise as valid/correct all the situations you need.

UPDATE

As suggested via comments and, in order to deliver the exact functionality the OP is looking for (a sample of it; as far as the proposed solution is just an example of the type of casuistic approach, which this problem requires), here you have a corrected bool function considering the posted example wrong:

static bool IsValidUrl2(string urlString)
{
    Uri uri;
    return Uri.TryCreate(urlString, UriKind.Absolute, out uri)
        && ((uri.Scheme == Uri.UriSchemeHttp && numberOfBits(urlString.ToLower(), "http://") == 1)
         || (uri.Scheme == Uri.UriSchemeHttps && numberOfBits(urlString.ToLower(), "https://") == 1)
         || (uri.Scheme == Uri.UriSchemeFtp && numberOfBits(urlString.ToLower(), "ftp://") == 1)
         || (uri.Scheme == Uri.UriSchemeMailto && numberOfBits(urlString.ToLower(), "mailto:") == 1)
         );
}

static int numberOfBits(string inputString, string bitToCheck)
{
    return inputString.ToLower().Split(new string[] { bitToCheck.ToLower() }, StringSplitOptions.None).Length - 1;
}

CLARIFICATION

The only way to be completely sure that a given url is valid or not is actually testing it; but the OP said no connections what I understood as pure string analysis: exactly what this answer is about. In any case, as explained via comments, the intention of this post is just showing the way through: .NET + custom algorithm (by understanding that aiming overall-applicability by relying on string analysis is pretty difficult); my proposal accounts for the specific problem explained by the OP (duplicated "heading parts") and by relying on his conditions. It cannot be understood as a generally-applicable, blindly-usable approach at all; but as a general framework with a sample functionality (a mere proof of concept).

CLARIFICATION 2

As shown in the conversation with Jon Hanna in the comments below, there is a third alternative I wasn't aware of: analysing the to-be IP address (i.e., numbers already put together, but IP address availability not checked yet and thus definitive IP address generation not started); by looking at it, it would also be possible to determine the likelihood of a given string to be a valid URL address (under the expected conditions). In any case, this cannot be considered as a 100% reliable process either, as far as the IP address being analysed is not the definitive one. In any case, Jon Hanna is in a much better position than myself to talk about the limitations of this alternative.

edited Apr 17 '15 at 07:30

answered Nov 15 '13 at 09:53

varocarbas

12,354
4
26
37

Though it doesn't stop the URI they complain about. – Jon Hanna Nov 15 '13 at 09:55
@JonHanna The whole point of my approach is not stopping it; but correcting it if possible and stopping it if no other option. I will add a correction right now to stopping it. – varocarbas Nov 15 '13 at 09:58
@JonHanna There you have a boolean function considering the posted example wrong. – varocarbas Nov 15 '13 at 10:06
Can i Compare the result with passing parameter? in string returnValidUrl(string urlString) – Civa Nov 15 '13 at 10:07
@Civa I am not sure what you mean. Both approaches work fine with your example if this is what you refer; and with any equivalent one, that is, duplicated heading parts (`ftp://ftp://`). – varocarbas Nov 15 '13 at 10:09
@Civa my answer includes two approaches: the first one proposes you to replace your original function returning a true/false with one returning a string (if the url is clearly invalid, the string would be ""; if it is right or "correctable", like the one in the example you propose, it would return the right url). My second function IsValidUrl2 is an extension of yours recognising the case where the heading parts are duplicated (it returns false for `http://http://www.Google.com`). But, please, understand the idea I am intending to explain: you have to build up a casuistic approach... – varocarbas Nov 15 '13 at 10:16
1

@Civa... today you found this problem of `http://http://` but tomorrow you will find that you don't want to consider `http://co.www.url.com` as valid and so on. Thus, my solution to your question is not "take this code and use it blindly"; but a sample of the kind of approach you have to build: one taking into account the .NET capabilities (via Uri Schema, like you are doing) together with a set of custom algorithms finding/correcting situations which shouldn't be considered right. I hope that my point is clearer now. – varocarbas Nov 15 '13 at 10:19
Doesn't catch "http://blah.local/" or "http://192.168.0.2/" which are essentially variants of the same problem as with "http://http" – Jon Hanna Nov 15 '13 at 10:51
@JonHanna I am not sure of the sub-version (function correcting url or one returning true/false) you mean. But, as explained, my answer only delivers a way to account for the problem (.NET + custom algorithm). I have focused my answer on the specific sub-problem referred by the OP (repeated headings) and have taken his code as a reference. You have to input some "heading" `http://` or equivalent and it detects only exact repetitions `http://http://`. I think that my code quite clear and also my answer. – varocarbas Nov 15 '13 at 10:55
The specific problem of the OP is they want to block LAN-only URIs like `http://http://`. `http://something.local/` and `http://192.168.0.0/` are URIs with the same problem. – Jon Hanna Nov 15 '13 at 10:58
@JonHanna The OP said that he wants to know it without connecting; and your answer is based on connections. I said pretty clearly in my answer from which assumptions it comes (no connection, just string analysis); also that I am proposing an alternative (correcting instead of avoiding) and that this is just the basic framework (for a custom approach). Your comments can be applied to an answer like yours (e.g., finding a string being wrongly accounted; when your answer consist in an algorithm aiming to account for all the situations), but don't think that have a real point with one like mine. – varocarbas Nov 15 '13 at 11:01
@varocarbas What do you mean "your answer is based on connections"? I'm doing static analysis to catch the sort of problem the OP has - a valid URI that only works in certain contexts. – Jon Hanna Nov 15 '13 at 11:13
@JonHanna if the input to a given function is a string (e.g., `http://www.myaddress.com`) and you are able to get a bunch of numbers from it (the IP address) is because you are getting this information from somewhere; and unless .NET has a collection storing all the IPs for all the websites (from its string name), I understand that what you are doing is connecting to internet to retrieve this information what is not what the OP wants. – varocarbas Nov 15 '13 at 11:16
@varocarbas yes, getting the IP address of `myaddress.com` would need to connect to the internet. What does that have to do with my answer? – Jon Hanna Nov 15 '13 at 11:19
@JonHanna that your answer starts from the `IPAddress ip` variable, which you populate with `IPAddress.TryParse(host, out ip)`, which gets the IP for the given string url? – varocarbas Nov 15 '13 at 11:21
@varocarbas By parsing. You don't need to connect to the internet to get the IP address of the URI `http://193.120.166.84://`, but in that case you need the rest of the code to differentiate between one that will work anywhere (like that example) or one that is lan-local like `http://192.168.0.0://` - the same problem that `http://http://` has. – Jon Hanna Nov 15 '13 at 11:25
@JonHanna You agree with the "you can only get the IP address by connecting to internet" statement; so, my question is: where is this IP coming from? – varocarbas Nov 15 '13 at 11:28
@varocarbas No, you can only get the IP address of `myaddress.com` by connecting to the internet, which my answer doesn't try to do. In the case of `myaddress.com` it checks there's at least one period in the host, and it doesn't end with `.local`. In the case of `http://193.120.166.84://` it doesn't need to connect to the internet to know that the IP address is `193.120.166.84`, but it does need to examine the IP address to see if it's reserved or not. You even quoted `IPAddress.TryParse(host, out ip)` in your comment above! – Jon Hanna Nov 15 '13 at 11:31
@JonHanna I see. I thought that the IP address "building process" started when a connection was set (joining the different values and checking whether the given address was available, all together), not before. Thanks for the info and +1 for you (haven't tested your code but assume that it works). – varocarbas Nov 15 '13 at 11:34
Ah. I on my part wasn't getting what part you weren't getting, so things got a bit circular there! – Jon Hanna Nov 15 '13 at 11:41

score 1 · Answer 3 · answered Nov 15 '13 at 09:41

1

You could write a custom function to check if http:// or initial part is repeated alongwith this code you have written.

answered Nov 15 '13 at 09:41

user2986151

21
1

I'm not asking for this particular case. i'm searching for generic solution to my problem. – Civa Nov 15 '13 at 09:45

IS String is valid URL OR NOT

3 Answers3