I have a routine that looks at a base domain URL (http://www.site.com), finds all the links, and then finds all the images and their attributes for each page. This is done in two for
loops:
- one first for the links, and inside each loop of each link
- one for each image found on each page.
I've been using my band's website as a test bed, and each page at the top has a "spotlight" section of featured articles, which is setup as an image slider. So, I only want unique image url's for a website, but every thing I am trying is still letting duplicates through. I had tried doing the dupe check while building the array, but that was fruitless. But then I found this link: How to remove duplicate values from a multi-dimensional array in PHP and comment, but this does not work either.
Let's start with a sample array of data I scraped from my band's website:
Array
(
[http://darwenstheory.com/] => Array
(
[0] => Array
(
[3] => Array
(
[url] => http://darwenstheory.com/images/dtheory-spotlight-vidclips.jpg
[alt] => Ventura Theater Video Clips Posted!
[w] => 644
[h] => 202
[ratio] => 3.2
)
)
[1] => Array
(
[3] => Array
(
[url] => http://darwenstheory.com/images/dtheory-spotlight-vtpix.jpg
[alt] => Video Clips Posted!
[w] => 644
[h] => 202
[ratio] => 3.2
)
)
[2] => Array
(
[3] => Array
(
[url] => http://darwenstheory.com/images/dtheory-spotlight-merch.jpg
[alt] => Photos from Ventura Theater!
[w] => 644
[h] => 202
[ratio] => 3.2
)
)
[3] => Array
(
[4] => Array
(
[url] => http://darwenstheory.com/wp-content/uploads/2011/10/peepdestroyflyer.jpg
[alt] =>
[w] => 533
[h] => 800
[ratio] => 0.7
)
)
[http://darwenstheory.com/2011/01/11/ventura-theater-video-clips-posted/] => Array
(
[0] => Array
(
[3] => Array
(
[url] => http://darwenstheory.com/images/dtheory-spotlight-vidclips.jpg
[alt] => Ventura Theater Video Clips Posted!
[w] => 644
[h] => 202
[ratio] => 3.2
)
)
[1] => Array
(
[3] => Array
(
[url] => http://darwenstheory.com/images/dtheory-spotlight-vtpix.jpg
[alt] => Video Clips Posted!
[w] => 644
[h] => 202
[ratio] => 3.2
)
)
[2] => Array
(
[3] => Array
(
[url] => http://darwenstheory.com/images/dtheory-spotlight-merch.jpg
[alt] => Photos from Ventura Theater!
[w] => 644
[h] => 202
[ratio] => 3.2
)
)
In the array above, I should not have the first three image URL's for the 2nd index (which is a URL of a sub-page on the domain). Simplified version of what I am using to build the array:
foreach($links as $link)
{
$images = get_page_images($link); //array;
foreach($images as $image)
{
//i have some things here to setup a "score" for each image
$data['scrape'][$link][][$score] = array('url' => $image['url'], 'alt' => $image['alt'], 'w' => $image['w'], 'h' => $image['h'], $ratio);
}
}
I have a feeling I am over-complicating this, but I have no idea how or why. I'm here to learn, whether it's me being stupid or something else.
I would just like the above array I am building to not have a duplicate value for the 'url' key in the deepest-level array.
Thank you so, so much in advance for criticism, help, and every thing.