First check if the image HTML tag has a width attribute. If it's above 40, skip over it. As Matthew mentioned, it will get false positives where people sized down a large image to 40px wide, but that's no big deal; the point of this step is to quickly weed out the first dozen or so images that are obviously too big.
Once the script catches an image that SAYS it's under 40px wide, check the header information to deduce a general width based on the size of the file. This is faster than getimagesize because you don't have to download the image to get the info.
function get_image_kb($path) {
$headers = get_headers($path);
$len = explode(" ",$headers[6]);
return $len[1];
}
$imageKb = get_image_kb('test1.jpg');
// I'm going to gander 40x80 is about 2000kb
$cutoffSize = 2000;
if ($imageKb < $cutoffSize) {
// this is the one!
}
else {
// it was a phoney, keep scraping
}
Setting it at 2000kb will also let through images that are 100x30, which isn't good.
However, at this point, you've weeded out most of the huge 800kb files that would really slow you down, and because we know it's under 2kb, it's not too taxing to test this one with getimagesize() to get an accurate width.
You can tweak the process depending on how picky you are for the 40px mark, as usual higher accuracy takes more time, and vice versa.