0

I have an nginx+PHP+MySQL server. MySQL has a big database with jobs in it. The PHP script I am running should retrieve all the jobs from database and output an XML feed containing all the jobs. The script is currently organized in this way

$arr = get_all_job_ids(); //returns 18k PHP array that is fueled by SELECT `id` FROM `jobs`;
foreach ($arr as $i=>$id){
    if ($i>9700){break;} //for debugging
    $job = get_job_by_id($id); //PHP array generated by SELECT `title`, desc, ... FROM `jobs` WHERE `id`=$id;
    $job_xml = replace_job_tags($job, $xml_template); //regular expressions
    echo $job_xml;
    flush();
}

The server doesn't have anybody on it, it is dedicated to experiments only and nothing else is running on it. First of all, even though I do stuff like freeing sql results and explicitly cleaning up anything that PHP might not be cleaning up, the overall memory consumption constantly grows in the loop. It drops after the flush() but it doesn't get back to the level it used to be at the iteration start.

Secondly and more importantly - the run times and CPU loads are completely inconsistent. Sometimes a 9.7k job feed can be generated nicely within 17 seconds. In these cases, according to "top" and "SHOW FULL PROCESSLIST;" CPU briefly spikes to 100% during the get_all_job_ids() step but then calms down and takes its time to retrieve and flush() jobs one by one.

But at other times php5-fpm and mysqld grab all the CPU for themselves during both the initial id retrieval step and the loop-ed queries for individual jobs. Also, even though according to "SHOW FULL PROCESSLIST;" individual jobs are being queried, http client never gets any output and instead eventually receives "504 Gateway Time-out". After relatively long time (minutes) mysqld and php5-fpm return to normal. Also when I exclude get_job_by_id() step and instead hardcode an array there, everything runs nicely and smoothly.

I am completely out of ideas of what can be causing this and what else I can try to potentially shed more light on the issue. if you have any ideas, I'd be glad to hear them!

j0k
  • 22,600
  • 28
  • 79
  • 90
Eugene
  • 4,197
  • 7
  • 37
  • 54
  • Also I can't really find a stable number to reproduce this. Sometime 9.7k is ok but right after that 5k can hit the problem. – Eugene Oct 19 '12 at 16:47
  • The inconsistency in run time may be explained by caching. mysql has a query cache. also, between mysql and the os, parts of files get cached and dont result in real disk seeks and reads. sometimes...the cache just thrashes and things get real slow- but i would imagine you see back to back consistency unless other stuff on the server is making significant active use of system resources, which can push stuff out of cache. – goat Oct 19 '12 at 16:52
  • The answer to "Why is it slow?" is always the same: Profile it, e.g. http://stackoverflow.com/a/21189/82769 – Frank Farmer Nov 30 '12 at 22:29

1 Answers1

0

Why do you need to do "SELECT id FROM foo" and then "SELECT ... FROM foo where id = ... " - the BIG problem here is, that this must be done in one query

SergeS
  • 11,533
  • 3
  • 29
  • 35