0

I am building a scraper app with nodejs and I'd like it to scrape a certain site 2 times a day. now, there's a problem though.

what I am used to doing is that from client side, someone makes a request and the app scrapes data and shows the result.

but what If I want the app to just do the scraping 2 times a day, without the need for client to make a request to server. how does one do that?

Basically, it's a site where the user puts in keywords they are searching for. the app searches for that keyword everyday and it notifies the user when the keyword shows up on the page. so, how does one do that without having the user to search for the keyword everyday?

Seems like we can use cron jobs for scheduling, and the scraping will happen twice a day or any times I choose, but the thing is how do I send the data from the scraping to client side? Or how do I notify the site user that the keyword was found and he can come to the site and look at it?

faraz
  • 2,603
  • 12
  • 39
  • 61

2 Answers2

1

but what If I want the app to just do the scraping 2 times a day, without the need for client to make a request to server. how does one do that?

You use a task scheduler, such as Cron.

how do I notify the site user that the keyword was found and he can come to the site and look at it?

There are lots of options.

Quentin
  • 914,110
  • 126
  • 1,211
  • 1,335
  • thanks. that helps a lot. So, do you think there is a way to send the results to the site, so when the customer is notified, they can come and look at the results on the site. if it was a client request. I could have sent the results using res.send. but here, what do I use? – faraz Jul 31 '18 at 03:42
  • You would have to store them somewhere (e.g. in a database) and associate them with the user. – Quentin Jul 31 '18 at 08:06
  • hmm, that's what I was feeling too. thanks. understood. – faraz Jul 31 '18 at 09:04
0

The request npm module would allow you to do that. The following (server) app queries from an external API every 10 seconds:

const request = require('request');

function doRequest() {
    request('http://www.randomtext.me/api/', function (error, response, body) {
        console.log('error:', error); 
        console.log('statusCode:', response && response.statusCode); 
        console.log('body:', body); 

        // do whatever you need to do with you result
        // and notify the user (... not clear what channel you want to use)
        // could be done with sockets, email, ... or text messages (twillio) ...
    }); 
}

setInterval(doRequest, 10000); // <-- adapt your intervall here

So this is an easy example for server to server requests ... hope that helps.

Sebastian Hildebrandt
  • 2,661
  • 1
  • 14
  • 20
  • Look at the second paragraph of the question. They can already scrape the site using Node. The question is about doing it on a schedule instead of on demand, and about how to inform the user since they won't be able to just respond to the (now non-existent) HTTP request. – Quentin Jul 30 '18 at 13:05
  • Yes, but here the channel was not specified and I placed this in the comments ... and the question states " want the app to just do the scraping 2 times a day, without the need for client to make a request to server" (so not clear if they really manage server to server scrapping) which the answer shows ... so maybe no need to downvote it ;-) – Sebastian Hildebrandt Jul 30 '18 at 13:09
  • yeah. I have already done the scraping part. just needed the scheduling part – faraz Jul 31 '18 at 03:42