3

I'm working on a site that pulls product price data from Amazon.com and Walmart. I'm guessing that in the future, it will also pull data from other places.

My first idea was to pull the data directly from Amazon (using their product advertising API) and then display the data on the site for every single visitor who landed on the page. That's not a bad idea if there aren't many product prices I'll be retrieving (or if the number of site visitors is low). I think that I will run into problems once the site gets busy and if i increase the number of product whose price i want to pull.

Using the Amazon and Walmart API, i was able to make successful REST api calls and parse the XML returned to obtain the information that i needed.

Does it make sense to store that information in a local database, update it say, every 1-5 minutes, and then get the site visitors to pull the pricing information from my local database instead of making an API call to Amazon and Walmart?

If I do go this route and create a function that uses the Amazon and Walmart API to pull price data, how do I then automatically run this function every 1 to 5 minutes in the background, 24/7/365?

Good Lux
  • 896
  • 9
  • 19

2 Answers2

2

Does it make sense to store that information in a local database

Yes. Actually this sounds exactly like a typical caching setup. I would recommend looking into Redis instead of using a relational database for this.

how do I then automatically run this function every 1 to 5 minutes in the background

Probably a Cron job. You would have to provide more information like where your application is running (AWS EC2 or somewhere else?) and if it is running on Linux or Windows, before I could give a more detailed recommendation.

Mark B
  • 183,023
  • 24
  • 297
  • 295
1

It depends on your load and cache hits - for example if you only have 100 visitors a day visiting couple of product pages - no need to update 1000+ positions every minute, may be even no need to store anything.

But if your visitors view same pages often - then caching will be useful.

Then here come different strategies:

  • prefilled cache (the one you mentioned) - fetch all data in advance and keep updating via a cron job or a dedicated daemon. This speeds up page load for first visitor a bit, but is most bandwidth-expensive.
  • on-demand caching - start with empty cache and only fetch data on first request (or when request comes to serve expired data), first request will be slower, but this ensures that only needed data will be requested and cached.
  • combinations of above: for example fetch on first request, but then update in background job
Vasfed
  • 18,013
  • 10
  • 47
  • 53
  • Thanks for this info. The reason I was thinking about going with the prefilled cache (thanks for the terminology) is because I had a page with around 7 products on it..if I refreshed the page too quickly, i would get an error from the API because i was making too many requests at once. – PHPNeophyte Dec 28 '15 at 20:09
  • if results from first page load are cached then refresh will read already from cache and have no impact on api usage. But you may have to throttle them a bit (add a delay between requests) thus first request will be slower – Vasfed Dec 28 '15 at 20:11