1

How do you get the second level domain from a given URL?

All the articles I have read so far assume that the domain has at least one dot. I need a way to get the top level domain for a particular URL.

Examples:

http://www.example.com -> example.com
http://sub.example.com -> example.com
http://example.com -> example.com
http://example:1000 -> example
http://localhost:1000 -> localhost

This code here is what I tried and doesn't work for all the above scenarios:

var uri = new Uri("http://www.example.com");
var host = uri.Host;
var p = host.LastIndexOf(".");
var domain = host.Substring(p + 1);

(Joe's linked question does not address the main issue for which I asked this question--it only considers domains that have at least one dot in it.)

Prabhu
  • 12,995
  • 33
  • 127
  • 210
  • possible duplicate of [Parsing string for Domain / hostName](http://stackoverflow.com/questions/10735190/parsing-string-for-domain-hostname) – Joe Apr 08 '15 at 23:16
  • @Joe the linked question does not address the main issue for which I asked this question--it only considers domains that have at least one dot in it. – Prabhu Apr 08 '15 at 23:23
  • 2
    IMHO for `www.example.com` the top-level domain is `com` – vesan Apr 08 '15 at 23:29
  • @vesan what do you call the "example" part? I'll rename it. – Prabhu Apr 08 '15 at 23:30
  • @Prabhu: Second-level domain: http://en.wikipedia.org/wiki/Second-level_domain – vesan Apr 08 '15 at 23:39
  • 1
    @vesan It's not just in your esteemed opinion. It's also in [Wikipedia's opinion](http://en.wikipedia.org/wiki/Top-level_domain). – mason Apr 08 '15 at 23:39
  • Essentially I need a way to set the domain when setting cookies to be shared across subdomains. ".example.com" is what I need the domain to be, but I can't hard code it. – Prabhu Apr 08 '15 at 23:51
  • So why don't you write your own method to pull it out of a Uri? This seems like the kind of basic string manipulation a student could do. – mason Apr 08 '15 at 23:57
  • @mason sure why not, but I just wanted to make sure that there wasn't a built-in way to do this so I didn't end up reinventing the wheel. – Prabhu Apr 09 '15 at 00:02
  • @Prabhu - there is no generic method because there is no corresponding generic concept "this-or-parent domain to set cookies on". Each site have to define its own rules (i.e. my.site.sample.com - should cookies be set on "my.sites.sample.com" or "site.sample.com", at least in this case obviously "sample.com" not going to work...) – Alexei Levenkov Apr 09 '15 at 00:43
  • @AlexeiLevenkov Got it. I ended up creating a string manipulation function in the end. – Prabhu Apr 09 '15 at 23:11

1 Answers1

-1

Use:

var uri = new Uri("http://sub.domain.com:8080");
var trimmedUri 
  = Regex.Match(uri.Host, @"\w+(\.(com|org|edu))?$", RegexOptions.IgnoreCase).Value;
Colin
  • 4,025
  • 21
  • 40
  • My bad. I didn't read the question close enough. Try the above tweak. – Colin Apr 08 '15 at 23:36
  • Did you test this against all example inputs to make sure you got the expected outputs? – mason Apr 09 '15 at 00:04
  • All except the subdomain, which was added after my last edit. I'll need another tweak now that I understand he wants those, too. – Colin Apr 09 '15 at 00:08
  • Done. Works with all test cases. – Colin Apr 09 '15 at 00:16
  • @Colin And what if it ends in `.net` or `.biz`? Or `.co.uk`? Or one of the other global TLD's? Your solution isn't very robust because it pre-supposes certain TLD's. A better approach would be to look at the *structure* of the strings, rather than the exact content. – mason Apr 09 '15 at 02:00
  • @mason For others just add them to the OR-ed part of regex of course (I provided three as an example of how to extend). Whether that's sufficient depends on the use-case; if it's just a matter of wanting a bit of code to run on both a dev machine and production, it will work fine. It's also one line of code that is simple and readable (always a goal of mine). If the code block is used in a situation that accepts any possible domain, you'd want something more comprehensive, though I'm unaware of a reliable future-proof pattern. No reason to over-engineer without more detail on use case, anyway. – Colin Apr 09 '15 at 16:08
  • You're unaware of a reliable future proof pattern? The example was pretty clear on the pattern. If it's the domain name is just `example`, then return `example`. If the domain name is `example.com`, return `example.com`. Is the domain name is `www.example.com`, then return `example.com`. Seems like a pretty clear pattern. You might call it overengineering, but I call it writing reusable code. – mason Apr 09 '15 at 16:12
  • You're missing the point. My approach takes care of every one of the cases you mentioned (and the rest of the cases the OP mentioned). Where the question of the future-proof pattern comes in is in distinguishing between what is valid accounting for local DNS, registered domains (present & future), plus possible subdomains. That's where the question of use case comes in. In general, if you have a one-line solution that works and is readable, but you advocate for a more bloated, complicated approach, you're writing bad code. Even if there is a viable pattern, I'd still consider the regex method. – Colin Apr 09 '15 at 16:45