I have a list of over 500 urls that i have to scrape because my distributor doesn't offer an api or a csv. The list is actually an array containing the ids of those products that i want to keep track of:
$arr = [1,2,3,...,564];
The url is the same, you only change the id at the end of it:
$url = 'https://shop.com/products.php?id='
Now, on localhost
i used a foreach loop
to scrape each and everyone of those urls:
foreach($arr as $id){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url . $id);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
//preg_meth_all - get the data that i'm looking for
//put that data into an array
curl_close($ch);
}
But the problem is that, first of all, i think that that's not wise at all - and i know that for a fact because when i(accidentally) ran the script(on localhost) my access was banned/blocked to that shop.com
- getting the message: Too many requests...429
:D.
I was trying to sleep
that foreach every 10
loops using 10 as modulus
$x = 0;
foreach($arr as $id){
//ch request - get data and add it into an array
$x++;
if($x % 10 == 0){
sleep(2);
}
}
But this takes like forever to execute.
Even tho i am able to connect and take the date that i need from each individual product i want to find a solution using curl
(since there's no api nor csv) that will run that script at once but in a safe/wise way.
Is there something like that? If yes, can you please help me understand how?
Thank you!