I want to open a URL and RegEx all the image's URLs from the page. Then I want to cURL all of them and check what size they have. In the end I want to get the biggest one. How do I do this?
3 Answers
You could start with getting the URL using curl, saving it in a variable.
Then you could apply a regex like this one: <img.*?src=['"](.*?)['"]>
Check if the source starts with http or is a relative link, if its a relative link you can prepend the url of the page.
Finally get the size of the images using getimagesize() http://php.net/manual/en/function.getimagesize.php

- 922
- 1
- 11
- 18
-
-
-
-
-
hehe trust me i get the point, I went through the complete CodingHorrors article :) – gX. Mar 20 '10 at 18:10
Use the php DOM to find the images.
I have not tested this code at all, but it should get you headed in the right direction.
$urls = array();
$dom = DOMDocument::loadHTML(YOUR_HTML);
$imgList = $dom->getElementsByTagName('img');
$imgCount = $imgList->length;
for ($i = 0; $i < $imgCount; $i++) {
$imgElement = $imgList->item($i);
if ($imgElement->hasAttribute('src')) {
$urls[] = $imgElement->getAttribute('src');
}
}
If you want to get linked images, you can change 'img'/'src' to 'a'/'href'. But you will need to find a way to filter the list to get only images.
You did not say what your criteria is for image size, so I can't help you there. Do you want maximum file size or resolution?

- 5,682
- 3
- 25
- 33
It might be already obvious by now, use a DOM parser, not regex. Just get all elements by tag name <img>
and then get for each the URL from its src
attribute. To determine the image's size without downloading the entire image, you'd probably like to fire a HTTP HEAD
request instead and then determine the Content-Length
header in the obtained response. The http_head()
may be useful in this.

- 1,082,665
- 372
- 3,610
- 3,555