0

I'm currently working on a big scapping project so i use a main big php script to run all of my scrapping (over 150 websites) the script take around 5-8h to run.

So in my main script i have a foreach who run a exec('php -f ...') who run the scraping for one website.

I would prevent some crash of this php exec because when i have multiples timeout the script crashing and don't continue so maybe i can replace :

echo exec('php -f ...');

by :

try{
    echo exec('php -f ...');
}catch(Exception $e){
    // i will put a log here
    continue;
}

But i think it's not working with timeout (fatal error), so what is the best option to run all of my scripts without crash stop it ?

Thanks !

RaYmMiE
  • 11
  • 4
  • 1
    If you don't need immediate output of the scripts, you can start the process/exec [async](https://stackoverflow.com/questions/222414/asynchronous-shell-exec-in-php) – DarkBee Mar 18 '22 at 09:51
  • I need the output because in my main script i will process the scrapping exec result ;) – RaYmMiE Mar 18 '22 at 09:52
  • 2
    You want to prevent it from timing out, without changing the timeout limit. This is a contradiction. – ADyson Mar 18 '22 at 09:53
  • But...maybe you can have a cron job which runs periodically and it works through a task list (e.g. from a database) so each time the job runs it scrapes a different website. Just a thought. – ADyson Mar 18 '22 at 09:54
  • Hummm, i will explain, sometime some website didn't respond ou timeout so the script take more than 300sec and timeout, i just want the script timeout but didn't do a fatal error and continue to the next scrapping file – RaYmMiE Mar 18 '22 at 09:55
  • [Prevent timeout during large request in PHP](https://stackoverflow.com/questions/3909191/prevent-timeout-during-large-request-in-php) and/or [Prevent session expired in PHP Session for inactive user](https://stackoverflow.com/questions/5962671/prevent-session-expired-in-php-session-for-inactive-user) ? – Luuk Mar 18 '22 at 09:57
  • 1
    So far we only know that you are starting _some_ PHP script, but not what you are actually doing within it. Maybe you need to specify a proper connection / request timeout in the HTTP requests you are making to begin with? – CBroe Mar 18 '22 at 09:57
  • 1
    Still 8 hours is a lot to do this in one go, if you used a queue, it would take bout 3 - 4 minutes for one website. Would make debugging so much smoother. Anyway, if you want to test the availability of a website you can just make a header request with curl and lower the [timeout](https://stackoverflow.com/questions/2582057/setting-curls-timeout-in-php) of the curl to create a ping – DarkBee Mar 18 '22 at 09:58

0 Answers0