I using simple_html_dom for scrape pages website, the problem is if i want to scrape many page like 500 url pages that takes a long time (5-30 minutes) to complete, and thats make my server error 500.
Some of these things I've done is:
- try using set_time_limit
- set ini_set('max_execution_time')
- add delay() timing
I many read from stackoverflow to use cronjob to split Long Running PHP Scripts, my question is How to split Long Running PHP Scripts ? can u give best way to split it ? can u give me step by step script because iam a beginner.
About my program, i have two file : file 1, i have array more than 500 link url file 2, this file have function to process scrape
example this is file 1:
set_time_limit(0);
ini_set('max_execution_time', 3000); //3000 seconds = 30 minutes
$start = microtime(true); // start check render time page
error_reporting(E_ALL);
ini_set('display_errors', 1);
include ("simple_html_dom.php");
include ("scrape.php");
$link=array('url1','url2','url3'...);
array_chunk($link, 25); // this i try to split for 25 but not working
$hasilScrape = array();
for ( $i=1; $i<=count($link); $i++){
//this is the process i want to call function get_data to scrape
$hasilScrape[$i-1] = json_decode(get_data($link[$i-1]), true);
}
$filename='File_Hasil_Scrape';
$fp = fopen($filename . ".csv", 'w');
foreach ($hasilScrape as $fields) {
fputcsv($fp, $fields);
}
fclose($fp);
i have thinking can i split array link for 25 array and thank i pause or make it stop for temporary (NOT DELAY because i have been try it no useless) the proses and run again, can u tell me please, thank you so much.