2

I have got a large array of lets say 500 urls, now using array_unique I can remove any duplicate values. However I am wanting to remove any duplicate values where the domain is the same while keeping the original domain (so only removing duplicates so this value is now unique).

I have been using the following however this only removes duplicate values:

$directurls = array_unique($directurls);

I have been playing around with the following to get the domain but am wondering how I can check the entire array for other parse_url domains in the array:

foreach($directurls as $url) {
$parse = parse_url($url);
print $parse['host']; //the domain name I just need to find a way to check this and remove it
}

I imagine I will need to use some form of loop perhaps where I can get the current host and check all other hosts in the array. If duplicates remove all duplicates and keep the current value. Maybe something like this could work am just testing it now:

foreach($directurls as $url) {
    $parse = parse_url($url);
    if (in_array($parse['host'], $directurls)) {
        //just looking for a way to remove while keeping unique
    }
}

If anyone has any suggestions or recommendations on other ways to go about this I would be very thankful.

Let me know if I need to explain a bit more.

gen_Eric
  • 223,194
  • 41
  • 299
  • 337
Simon Staton
  • 4,345
  • 4
  • 27
  • 49
  • Look at array_filter() – GordonM Oct 29 '13 at 16:02
  • How could you tell that an URL is `the original domain` ? Say for example you have `gmail.google.com` and `plus.google.com` . – HamZa Oct 29 '13 at 16:03
  • It does not have to be so exact but ideally want to strip out `example.com/apage`, `example.com/anotherpage`, `example.com/yetanotherpage` so my array is not getting filled with 100 url's that are all the same domain. And will look at array_filter now thanks. – Simon Staton Oct 29 '13 at 16:04
  • @SimonStaton what about you convert the whole array to an "only domain array" and then use `array_unique()` ? – HamZa Oct 29 '13 at 16:07
  • I thought about that too, but the only problem is I need to keep the original url string. I am just looking at `array_filter()` but maybe I could convert it into a new array, then compare the two in some way. Your suggestion has led me to [this post](http://stackoverflow.com/questions/4260086/php-how-to-use-array-filter-to-filter-array-keys) – Simon Staton Oct 29 '13 at 16:08
  • 1
    @SimonStaton: Here's a quick try: https://eval.in/58441 -- does that answer your question? – Amal Murali Oct 29 '13 at 16:13
  • @AmalMurali that sure does, I am very new to array functions so am reading about the functions you used :) thanks for the help. Feel free to make that an answer and I will accept it for other people in the future. – Simon Staton Oct 29 '13 at 16:16

1 Answers1

2

You could avoid having to loop through the URLs by using array_map() with a callback function.Grab the domain using parse_url(), and create a new array with just the domains. Now, you can simply create a new array with the URLs as keys and domains as values and just call array_unique() to get the unique items. Now, to get just the URLs into a new array, you can use array_keys():

$domains = array_map(function($d) {
    $parts = parse_url($d);    // or: parse_url($d)['host'] if PHP > 5.4
    return $parts['host'];     
}, $directurls);

$result = array_keys(array_unique(array_combine($directurls, $domains)));

Demo!

Amal Murali
  • 75,622
  • 18
  • 128
  • 150