I have made a simple web Crawler with PHP cURL that should grab all the images of a particular page from Amazon where the keyword samsung
has been searched.
Here is the code:
$curl = curl_init(); // $curl is going to be data type curl resource
$search_string = "samsung";
$url = "https://www.amazon.com/s?k$search_string";
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false); // ssl
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true); // storing in variable
$result = curl_exec($curl);
preg_match_all("!https://m.media-amazon.com/images/I/[^\s]*?._AC_UL320_.jpg!", $result, $matches);
print_r($matches);
curl_close($curl);
But now I get Null array:
Array ( [0] => Array ( ) )
I don't why it is showing that, so if you know what is going wrong or how can I handle this, please let me know, I would really appreciate any idea from you guys...
Thanks in advance.
Note that I have specified [^\s]*?
regular expression instead of image name to load all the available images on web page.
UPDATE #1:
Results of curl --head https://www.amazon.com/s?k=samsung
HTTP/1.1 503 Service Unavailable
Content-Type: text/html
Content-Length: 2671
Connection: keep-alive
Server: Server
Date: Tue, 15 Jun 2021 20:59:38 GMT
x-amz-rid: 9BVX8KQMWJ4QDJ75ETYV
Vary: Content-Type,Accept-Encoding,X-Amzn-CDN-Cache,X-Amzn-AX-Treatment,User-Agent
Last-Modified: Fri, 14 May 2021 19:08:48 GMT
ETag: "a6f-5c24ef9383000"
Accept-Ranges: bytes
Strict-Transport-Security: max-age=47474747; includeSubDomains; preload
Permissions-Policy: interest-cohort=()
X-Cache: Error from cloudfront
Via: 1.1 5345148f0ba8ae3c67b69d035acdbfc5.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: AMS50-C1
X-Amz-Cf-Id: AHdq2-QLEtCE4WvXZIEh_P75D8hCrHP09EAkNqBer5VBS-pI-blj1w==