0

I have a website that uses MySQL. I am using a table named "People" that each row represents, obviously, a person. When a user enters a page I would like to introduce news related to that person (along with the information from the MySQL table). For that purpose, I decided to use BING News Source API.

The problem with the method of calling the BING API for each page load is that I am increasing the load time of my page (round tip to BING servers). Therefore, I have decided to pre-fetch all the news and save them in my table under a coloumn named "News".

Since my table contains 5,000+ people, running a PHP script to download all news for every person and update the table at once results a Fatal error: Maximum execution time (I would not like to disable the timeout, since it is a good security measure).

What will be a good and efficient way to run such a script? I know I can run a cron job every 5 minutes that will update only a portion of rows everytime - but even in that case - what will be the best way to save the current offset? Should i save the offset in MySQL, or as a server var?

Joel
  • 5,949
  • 12
  • 42
  • 58

2 Answers2

1

Why not load the news section of the page via AJAX? This would mean that the rest of the page would load quickly, and the delay created from waiting for BING would only affect the news section, which you could allocate a loading placeholder to.

Storing the news in the DB doesnt sound like as very efficient/practical solution, the ongoing management of the records alone would potentially cause a headache in future.

SW4
  • 69,876
  • 20
  • 132
  • 137
  • Hmm- yes, you would need to cater to this. That said, if your AJAX calls are structured correctly, it should be fine. See here (from google): http://code.google.com/web/ajaxcrawling/docs/getting-started.html – SW4 Nov 18 '10 at 09:53
  • @joel.. the news is displayed only when users login right? then wats the point in Search Engines indexing the news items? – Shoban Nov 18 '10 at 09:55
1
  • use cronjob for complex job
  • you should increase the timeout if you plan to run as cronjob (you are pulling things from other site, not for public)
  • consider create a master script (triggered by the cronjob) and this master script will spawn multiple sub-scripts (with certain control), so that you can pull the data from BING News Source (with this you can multi download the 5000+ profiles) without have to download one-by-one sequentially (think batch processing)

Update

Cron is a time-based job scheduler in Unix-like computer operating systems. The name cron comes from the word "chronos", Greek for "time". Cron enables users to schedule jobs (commands or shell scripts) to run periodically at certain times or dates. It is commonly used to automate system maintenance or administration, though its general-purpose nature means that it can be used for other purposes, such as connecting to the Internet and downloading email

Cron - on Wiki

ajreal
  • 46,720
  • 11
  • 89
  • 119
  • Thank you! Is it possible to increase the timeout only for these scripts, while all other PHPs will have the default timeout? – Joel Nov 18 '10 at 09:51
  • Spawn php process is not refer to include or require, but directly call again PHP to run for a script. Meaning each spawn process is using their own default timeout. refer here : http://stackoverflow.com/questions/45953/php-execute-a-background-process – ajreal Nov 18 '10 at 09:56
  • Oh ok sorry. I got it. Your'e suggesting not to run it as a script through Apache (which have timeout limit), but as a PHP process in linux. Thanks! – Joel Nov 18 '10 at 10:04