1

I am writing a script for an e-commerce site. It will allow customers to subscribe to changes in one of many categories. Or any combination of categories, so the email that is sent to an individual customer is potentially unique to that customer. It will be run by cron once a week.

It is a busy site and it is possible that there may ultimately be tens of thousands of subscriptions with each one requiring processing and then the sending of an email. So, the question is what is the best way of doing this?

In a single file the process will inevitably time out.

I could run individual cron jobs until the list of subscriptions is completed? Or perhaps redirect the file to itself until the list is complete?

Is there a standard way of doing this? Any advice is gratefully accepted.

And I apologise if this is a duplicate. I have searched but finding the right search terms is tricky.

niccol
  • 129
  • 4
  • A PHP script called from a CRON will not time out...only webrequests time out I think. – Oli Oct 20 '14 at 08:50
  • Perform everything in batches of (random number) 1,000. Update 1,000 customers at a time then move on to the next batch of customers. Of course, do this at off peak times. – ʰᵈˑ Oct 20 '14 at 08:51
  • If you just worry about the timeout: http://php.net/manual/en/function.set-time-limit.php You could use it, but than you run into another problem with a cronjob. Sooner or later multiple cronjobs with the same task will run at the same time causing unnecessary server load. –  Oct 20 '14 at 08:51
  • @Oli - that is incorrect. Any PHP script has the capability to timout. There are settings and options that you can pass to prevent this. – Lix Oct 20 '14 at 08:53
  • I stand corrected: http://stackoverflow.com/questions/5874950/set-max-execution-time-in-php-cli – Oli Oct 20 '14 at 08:54
  • @FIA2008 I guess your concern could be solved by using an exclusive lock. – pikachu0 Oct 20 '14 at 08:55
  • Thanks to all. Yes, I know about setting the time limit. Just don't like doing on processes that grow. So, as an example, this starts with 10K iterations and it works fine. But then a year later we have 200K iterations and it fails. Setting the time limit is always a bandaid but that is just my opinion. It is clear that batch processing is better. The question is how to organise that batch processing. By subsequent cron jobs or by a redirect? – niccol Oct 20 '14 at 09:10

1 Answers1

0

Perform the update in batches.

Brief

  • Add a flag to your table
  • Grab records where the flag is still at the default value
  • Update the customers
  • Update the flag
  • Repeat.

Explanation

We need to create a "flag" in our table

ALTER TABLE `foo`
ADD COLUMN `is_updated` tinyint(1) NOT NULL DEFAULT 0; //0 = not updated. 1 = updated

So, in our file called by the cron, we can do the following to fetch the (next) 1000 unprocessed rows;

SELECT * FROM `foo` WHERE `is_updated` = 0 ORDER BY `id` ASC LIMIT 1000

Then we can perform our actual logic with these 1,000 entries, and finally update them.

UPDATE `foo` SET `is_updated` = 1 ORDER BY `id` ASC LIMIT 1000

Notes

  • Depending on the amount of processing you're doing, I would definitely recommended doing this at off-peak hours, or ship it to a slave server dedicated for just this task.
  • It may be worth while to create a queuing table which inserts customers that need to be updated once a particular category has been updated, and grab the customer e-mail from this table (but still use the "flag" idea to update in batches).
  • Have a read of this for tips on sending a lot of e-mails out at once.
  • Have a read of this for answers on multi-threading the customer update process.
Community
  • 1
  • 1
ʰᵈˑ
  • 11,279
  • 3
  • 26
  • 49
  • Thanks. Yes, I understand the concept of batch processing. The tables are flagged already for that reason. And, yes, evidently peak time is not the best time to run. The thread on multi-threading is what I was looking for . Thanks again. – niccol Oct 20 '14 at 09:53