1

I think regular expressions might be able to accomplish this, if not then string manipulation is also a viable solution.

I need to turn the following inputs:

  1. "http://open.thumbshots.org/image.pxf?url=www.party.com"
  2. "http://www.xclicks.net/sc/ct.php?s=9971&l=http%3A//www.google.com/imgres%3F"
  3. "http://whos.amung.us/pingjs/?k=yvybju40twbs&t=Mudswimmer%3A%20Spam%20%26%20Crap%3A%20Http%3AUniversity.com%3A%20No%20Animals%20Allowed..&c=c&y=htt"

into the following outputs:

  1. "party.com"
  2. "google.com"
  3. "University.com"

I am not trying to get the host name of the URL, I want the the second domain, the one in the query string.

João Angelo
  • 56,552
  • 12
  • 145
  • 147
  • take a look here [http://stackoverflow.com/questions/659887/get-url-parameters-from-a-string-in-net][1] [1]: http://stackoverflow.com/questions/659887/get-url-parameters-from-a-string-in-net – bart s May 16 '12 at 19:10
  • 3
    You might have picked a better example for the third url.... – Shai Cohen May 16 '12 at 19:12

2 Answers2

1

With everything that involves regular expressions there is a degree of uncertainty, for me at least, but giving your three inputs the following code works:

string[] urls = new string[] 
{ 
    "http://open.thumbshots.org/image.pxf?url=www.party.com",
    "http://www.xclicks.net/sc/ct.php?s=9971&l=http%3A//www.google.com/imgres%3F",
    "http://whos.amung.us/pingjs/?k=yvybju40twbs&t=Mudswimmer%3A%20Spam%20%26%20Crap%3A%20Http%3AUniversity.com%3A%20No%20Animals%20Allowed..&c=c&y=htt"
};

foreach (var url in urls)
{
    var result = HttpUtility.ParseQueryString(new Uri(url, UriKind.Absolute).Query);

    foreach (string item in result)
    {
        string value = result.GetValues(item).Single();

        const string DomainNamePattern = "(?:www\\.|\\b)(?<domain>([a-z0-9]([-a-z0-9]*[a-z0-9])?\\.)+((a[cdefgilmnoqrstuwxz]|aero|arpa)|(b[abdefghijmnorstvwyz]|biz)|(cat|com|coop|c[acdfghiklmnorsuvxyz])|d[ejkmoz]|(e[ceghrstu]|edu)|f[ijkmor]|(g[abdefghilmnpqrstuwy]|gov)|h[kmnrtu]|(i[delmnoqrst]|info|int)|(j[emop]|jobs)|k[eghimnprwyz]|l[abcikrstuvy]|(m[acdghklmnopqrstuvwxyz]|mil|mobi|museum)|(n[acefgilopruz]|name|net)|(om|org)|(p[aefghklmnrstwy]|pro)|qa|r[eouw]|s[abcdeghijklmnortvyz]|(t[cdfghjklmnoprtvwz]|travel)|u[agkmsyz]|v[aceginu]|w[fs]|y[etu]|z[amw]))";

        var match = Regex.Match(
            value,
            DomainNamePattern,
            RegexOptions.IgnoreCase);

        if (match.Success)
        {
            string domain = match.Groups["domain"].Value;

            Console.WriteLine(domain);
        }
    }
}

The regular expression used was adapted from here.

If you run this you get the following output:

// party.com
// google.com
// University.com
João Angelo
  • 56,552
  • 12
  • 145
  • 147
0

If your link always contain the url querystring key then you can simple get this by String url = Request.QueryString["url"].ToString(); This will retrun the value of url.

Waqar Janjua
  • 6,113
  • 2
  • 26
  • 36