0

I need to scrape data from certain websites at 12.00 AM PST, and present that scraped data on my website. How should I implement this? Will it be server-side or client-side? Should I use meteor-synced-cron?

I was thinking I'll do it without meteor-synced-cron, and do it instead in client/, in that if the time is 12.00 AM, I update my collection for once and for all. Is that the right approach?

Mathguy
  • 157
  • 11

1 Answers1

2

Use synced-cron from the server, you'll be much happier sooner. If you do it from the client then you have to (a) ensure at least one client is up and running at midnight and (b) make sure it's the right client with the proper privileges and not all clients scraping everything.

OTOH, if you want to distribute a job to multiple clients and have them all cooperate then that's a completely different proposition.

Anywhere in /server add:

SyncedCron.add({
  name: 'Daily Scraper',
  schedule: function(parser) {
    return parser.text('every 1 day'); // parser is a later.parse object.
  },
  job: function() {
    ... your scraping code here
  }
});

See Later.js for details on how to create the schedule

Michel Floyd
  • 18,793
  • 4
  • 24
  • 39
  • Makes sense! How do I do it on the server side? Do I add a synced-cron in the Meteor.startup() function or what? – Mathguy Oct 01 '15 at 23:22
  • Perfect, thanks! And just to clarify, this won't need me to do a "meteor reset" everyday, right? – Mathguy Oct 02 '15 at 00:51
  • 1
    Definitely not. My app has a number of synced-cron jobs that remove old data, you might need something like that if you're web scraping and you want to purge obsolete data. Otherwise why would you need to reset? – Michel Floyd Oct 02 '15 at 03:06
  • Also, if I include this in a file called scrape.js in the server/ directory, will it automatically execute the script in the server? – Mathguy Oct 02 '15 at 18:25
  • Thanks! In general, in what order will Meteor execute scripts in the server/ directory? I'm assuming any .js file in the server/ is executed at some point in time. Is that right? – Mathguy Oct 02 '15 at 19:28
  • Thanks for that link, makes sense now. So does that imply I don't even need to put this inside a function, i.e. just create a file (say, scrape.js) which simply has the code you mentioned above? – Mathguy Oct 02 '15 at 20:24
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/91207/discussion-between-michel-floyd-and-mathguy). – Michel Floyd Oct 02 '15 at 20:34