0

I am trying to create a website monitoring webapp using PHP. At the minute I'm using curl to collect headers from different websites and update a MySQL database when a website's status changes (e.g. if a site that was 'up' goes 'down').

I'm using curl_multi (via the Rolling Curl X class which I've adapted slightly) to process 20 sites in parallel (which seems to give the fastest results) and CURLOPT_NOBODY to make sure only headers are collected and I've tried to streamline the script to make it as fast as possible.

It is working OK and I can process 40 sites in approx. 2-4 seconds. My plan has been to run the script via cron every minute... so it looks like I will be able to process about 600 websites per minute. Although this is fine at the minute it won't be enough in the long term.

So how can I scale this? Is it possible to run multiple crons in parallel or will this run into bottle-necking issues?

Off the top of my head I was thinking that I could maybe break the database into groups of 400 and run a separate script for these groups (e.g. ids 1-400, 401-800, 801-1200 etc. could run separate scripts) so there would be no danger of database corruption. This way each script would be completed within a minute.

However it feels like this might not work since the one script running curl_multi seems to max out performance at 20 requests in parallel. So will this work or is there a better approach?

Ryan
  • 152
  • 1
  • 13

1 Answers1

0

yes, the simple solution is use the same PHP CLI script and pass the args 1 and 2 i.e., indicates the min and max range to process the db record contains the each site information.

Ex. crontab list
* * * * * php /user/script.php 1 400
* * * * * php /user/script.php 401 800

Or using a single script, you can trigger multi-threading (multi-threading in PHP with pthreads). But the cron interval should be based on the benchmark of completion of 800 sites.

Ref: How can one use multi threading in PHP applications

Ex. the script multithread completes in 3 minutes then give the interval as */3.

Community
  • 1
  • 1
Senthil
  • 2,156
  • 1
  • 14
  • 19
  • Thanks - this is a great help. I don't think that I can use pthreads because "the pthreads extension cannot be used in a web server environment" and this is on a webserver. – Ryan Feb 27 '17 at 12:36