4

For example

google.com -> .com

google.co.id -> .co.id

hello.google.co.id -> .co.id

in vb.net?

Can that even be done?

Hanky Panky
  • 46,730
  • 8
  • 72
  • 95
user4951
  • 32,206
  • 53
  • 172
  • 282
  • You can't just do this auto-magically. If feasible, I would use a list of TLDs from a good source or even use a library. You cant do it with something like regex because sometimes you want the string after the last dot, but sometimes you want the last two. – Gray Oct 30 '13 at 14:00
  • You can use System.IO.FileInfo to get the extension of .com but how would it know google is the domain and you want everything after that. – Steve Oct 30 '13 at 14:01
  • there is no special vb.net function for that? – user4951 Oct 30 '13 at 14:04
  • @JimThio The reason there cant really be is because this is a list that changes from time to time - there is no absolute way of *calculating* it. – Gray Oct 30 '13 at 14:05
  • 1
    @Robert Actually, what he is looking for is the [public suffix](http://en.wikipedia.org/wiki/Public_Suffix_List) not the TLD. I made the same mistake. – Gray Oct 30 '13 at 14:12
  • 2
    The TLD of *google.co.id* is *.id*, not *.co.id*. So you're not looking for TLDs like Robert said. You'll have to get a list of ccTLDs with a second-level hierarchy (like .id and .uk). – sloth Oct 30 '13 at 14:13
  • Actually I only need the .id part. I want to know which whois to ask – user4951 Oct 30 '13 at 15:26
  • I would put it in a Uri (for validation), then from the Host property pull what's after the last dot. To get the full TLD, you'll need to find a list. – the_lotus Oct 30 '13 at 18:30
  • possible duplicate of [Get the subdomain from a URL](http://stackoverflow.com/questions/288810/get-the-subdomain-from-a-url) – Gray Oct 30 '13 at 20:41

1 Answers1

3

By assuming that domains with various "." have to include the ".co." bit, you can use this code:

Dim input As String = "hello.google.co.id"
Dim extension As String = ""
If (input.ToLower.Contains(".co.")) Then
    extension = input.Substring(input.ToLower.IndexOf(".co."), input.Length - input.ToLower.IndexOf(".co."))
Else
    extension = System.IO.Path.GetExtension(input)
End If

UPDATE

As suggested via comments, the code above does not account for quite a few eventualities (e.g., .ca.us). The version below comes from a different assumption (.xx.yy can be present only if there are groups of 2 characters) which should take care of all the potential alternatives:

If (input.ToLower.Length > 4 AndAlso input.ToLower.Substring(0, 4) = "www.") Then input = input.Substring(4, input.Length - 4) 'Removing the starting www.  

Dim temp() As String = input.Split(".")

If (temp.Count > 2) Then
    If (temp(temp.Count - 1).Length = 2 AndAlso temp(temp.Count - 2).Length = 2) Then
        'co.co or ca.ca, etc.
        extension = input.Substring(input.ToLower.LastIndexOf(".") - 3, input.Length - (input.ToLower.LastIndexOf(".") - 3))
    Else
        extension = System.IO.Path.GetExtension(input)
    End If
Else
    extension = System.IO.Path.GetExtension(input)
End If

In any case, this is a casuistic reality and thus this code (built on a pretty limited understanding of the situation, my current understanding) cannot be considered 100% reliable. There are cases which cannot even be identified without knowing if the given set of characters is an extension or not; for example: "hello.ue.co". This analysis should be complemented with a function checking whether the given extension is valid or not (e.g., dictionary including a set of valid, although not evident, extensions), at least, in certain cases.

varocarbas
  • 12,354
  • 4
  • 26
  • 37
  • 1
    This may work for OP, but keep in mind that there are a lot of exceptions to this. For example, `.ca.us` or `.ny.us`. There is also addresses like `x.co`. See: http://en.wikipedia.org/wiki/.co – Gray Oct 30 '13 at 14:50
  • @Gray thanks for letting me know. No idea to be honest; that's why I have written my assumption on the top; will update my answer right now with this information. – varocarbas Oct 30 '13 at 14:57
  • @Gray the updated version takes care of any xx.yy case and does not rely on .co at all. If know of further situations still not accounted for by this code, please, let me know. – varocarbas Oct 30 '13 at 15:11
  • 1
    I'll +1 this because this is good for discussion. We could go back and forth and find all the exceptions, but you list your assumptions in here, and it could be that something like this would meet the needs of the OP. As you said, a perfect solution is going to require checking a list of valid suffixes. – Gray Oct 30 '13 at 15:23
  • 1
    @Gray Thanks. Exactly; but if you find any other alternative which is not be accounted by my solution, just let me know and I would be more than happy to update the code. – varocarbas Oct 30 '13 at 15:40
  • 1
    Well, one example would be something like dq.ca. This is the Dairy Queen website - but the suffix in this case is just `.ca`! So, `www.dq.ca` would return dq.ca, right? I feel like there are just too many possibilities to get this perfect. It all depends on how crucial details like this are to the asker. – Gray Oct 30 '13 at 20:27
  • @Gray Actually, I wrote the code over the assumption that the urls will be provided without www. (dq.ca is managed perfectly). But I have updated it and now it accounts both for urls with/without starting www. Thanks again :) – varocarbas Oct 30 '13 at 20:37
  • Fair enough, but if dq.ca (or any site like that - are there others?) added a sub-domain like `icecream.dq.ca` it would again output dq.ca, right? check this question: http://stackoverflow.com/questions/288810/get-the-subdomain-from-a-url I found it helpful. – Gray Oct 30 '13 at 20:39
  • 1
    @Gray Yes. This is the limitation I talked about in my description. There is nothing you can do against that; not even a person can know what to do in this case. You only know it because of being sure that dq is not a valid extension (= dictionary or equivalent is required for 100% accuracy). – varocarbas Oct 30 '13 at 20:42