1

I am trying to obtain the root url of an NSString containing an url. For example, if the URL passed is secure.twitter.com, I want twitter.com to be returned. This works in the class that I did below. It does not work however for some longer urls...

Here's my method:

-

(NSString *)getRootDomain:(NSString *)domain
{
    NSString*output = [NSString stringWithString:domain];

    if ([output rangeOfString:@"www."].location != NSNotFound)
    {
    //if the www is still there, get rid of it
    output = [domain stringByReplacingOccurrencesOfString:@"www." withString:@""];
    }

    if ([output rangeOfString:@"http://"].location != NSNotFound)
    {
    //if the http is still there, get rid of it
    output = [domain stringByReplacingOccurrencesOfString:@"http://" withString:@""];
    }

    if ([output rangeOfString:@"https://"].location != NSNotFound)
    {
    //if the https is still there, get rid of it
    output = [domain stringByReplacingOccurrencesOfString:@"https://" withString:@""];
    }

    NSLog(@"New: %@",output);

    NSArray*components = [output componentsSeparatedByString:@"."];

    if ([components count] == 2) //dandy, this is an easy one
    {
        return output;
    }

    if ([components count] == 3) //secure.paypal.com
    {
        NSString*newurl = [NSString stringWithFormat:@"%@.%@",[components objectAtIndex:1],[components objectAtIndex:2]];

        return newurl;
    }

    if ([components count] == 4) //secure.paypal.co.uk
    {
        NSString*newurl = [NSString stringWithFormat:@"%@.%@.%@",[components objectAtIndex:1],[components objectAtIndex:2],[components objectAtIndex:3]];

        return newurl;
    }


    //Path Components will return the root url in its array in object 0 (usually)

    NSArray*path_components = [output pathComponents];  

    return [path_components objectAtIndex:0];
}

How can I make this work for any URL?

Pripyat
  • 2,937
  • 2
  • 35
  • 69
  • Can you provide some more test data? what do you mean by "longer urls"? – Dave DeLong Feb 17 '11 at 21:10
  • URLs such as these: http://www.dailymail.co.uk/news/article-1353859/Student-calls-911-ask-trouble-growing-marijuana--triggers-arrest.html – Pripyat Feb 17 '11 at 23:01

6 Answers6

4

You could consider taking advantage of NSURL and NSString to do this, like so:

(NSString *)getRootDomain:(NSString *)domain
{
    // Return nil if none found.
    NSString * rootDomain = nil;

    // Convert the string to an NSURL to take advantage of NSURL's parsing abilities.
    NSURL * url = [NSURL URLWithString:domain];

    // Get the host, e.g. "secure.twitter.com"
    NSString * host = [url host];

    // Separate the host into its constituent components, e.g. [@"secure", @"twitter", @"com"]
    NSArray * hostComponents = [host componentsSeparatedByString:@"."];
    if ([hostComponents count] >=2) {
        // Create a string out of the last two components in the host name, e.g. @"twitter" and @"com"
        rootDomain = [NSString stringWithFormat:@"%@.%@", [hostComponents objectAtIndex:([hostComponents count] - 2)], [hostComponents objectAtIndex:([hostComponents count] - 1)]];
    }

    return rootDomain;
}
mxg
  • 20,946
  • 12
  • 59
  • 80
Ryan
  • 16,626
  • 2
  • 23
  • 20
  • What happens when it doesn't work? And can you give examples for which it doesn't work? – Ryan Feb 20 '11 at 06:05
  • For example: `secure.twitter.co.uk` would come back `co.uk` not `twitter.co.uk`. Maybe that’s the desired outcome though… – Ben Cochran Feb 23 '11 at 23:08
  • Sure, your code needs to know what it's looking for. Alterations could be made to make this method detect a .co.uk domain vs. a .com domain. A change like that would be trivial. – Ryan Feb 24 '11 at 19:27
0

To get parts of an URL, here is the perfect answer. However, you want to split the host part of the URL. With NSURL you can get the host like this: url.host

From the host, there is no way to know which is the important part. There could be a host named secure.twitter.com and another one called twitter.host1.com.

If you got custom specification, like remove "secure." if it is a host prefix, implement it. But if you are trying to find an universal solution, I would rather save the whole host string.

Community
  • 1
  • 1
Daniel
  • 20,420
  • 10
  • 92
  • 149
  • I ended up creating my own solution that can cut any URL to simply its host component. It's pretty CPU intense, but it works. NSURL is pretty unreliable in this as it expects a complete URL, which I don't always have. Sometimes I have `.google.com`, sometimes `www.google.com`, sometimes `http://www.google.com`. Too messy for NSURL. – Pripyat Jun 11 '12 at 14:04
  • You could complete the URL yourself if you know what you expect it to be for partial URLs. For example if you know that it will always begin with `http://www` for URLs like .google.com. – Daniel Jun 11 '12 at 14:10
0

The first dot in a url is always part of domain name, we can use this to manufacture this simple yet highly effective method. (and it works with subdomains and multiple dotted TLD's like co.uk)

-(NSString*)domainFromUrl:(NSString*)url
{
    NSArray *first = [url componentsSeparatedByString:@"/"];
    for (NSString *part in first) {
        if ([part rangeOfString:@"."].location != NSNotFound){
            return part;
        }
    }
    return nil;
}
NSLog(@"%@",[self domainFromUrl:@"http://foobar1.com/foo/"]);
NSLog(@"%@",[self domainFromUrl:@"http://foobar2.com/foo/bar.jpg"]);
NSLog(@"%@",[self domainFromUrl:@"http://foobar3.com/"]);
NSLog(@"%@",[self domainFromUrl:@"http://foobar4.com"]);
NSLog(@"%@",[self domainFromUrl:@"foobar5.com"]);




2012-08-15 23:25:14.769 SandBox[9885:303] foobar1.com
2012-08-15 23:25:14.772 SandBox[9885:303] foobar2.com
2012-08-15 23:25:14.772 SandBox[9885:303] foobar3.com
2012-08-15 23:25:14.773 SandBox[9885:303] foobar4.com
2012-08-15 23:25:14.773 SandBox[9885:303] foobar5.com
valexa
  • 4,462
  • 32
  • 48
0

You need to check the suffixs.

Example:

#import "NSURL+RootDomain.h"

@implementation NSURL(RootDomain)

- (NSString *)rootDomain {
    NSArray *hostComponents = [self.host componentsSeparatedByString:@"."];
    NSArray *suffixs = [NSArray arrayWithObjects:@"net", @"com", @"gov", @"org", @"edu", @"com.cn", @"me", nil];

    if ([hostComponents count] >= 2) {
        if ([hostComponents[hostComponents.count - 2] isEqualToString:@"cn"]) {
            if ([suffixs containsObject:[hostComponents lastObject]]) {
                return [NSString stringWithFormat:@"%@.%@.%@", hostComponents[hostComponents.count - 3], hostComponents[hostComponents.count - 2], hostComponents.lastObject];
            } else {
                return [NSString stringWithFormat:@"%@.%@", hostComponents[hostComponents.count - 2], hostComponents.lastObject];
            }
        } else {
            return [NSString stringWithFormat:@"%@.%@", hostComponents[hostComponents.count - 2], hostComponents.lastObject];
        }
    }

    return self.host;
}
Xiong
  • 333
  • 1
  • 3
  • 5
0
[[NSURL URLWithString:@"http://someurl.com/something"] host]

output: someurl.com

Yaro
  • 1,222
  • 11
  • 15
0
NSArray *array = [[newURL host] componentsSeparatedByString: @"."];
NSLog(@"%@", [NSString stringWithFormat:@"%@.%@", [array objectAtIndex:[array count]-2], [array objectAtIndex:[array count]-1]]);
MHC
  • 6,405
  • 2
  • 25
  • 26