1

I am making a newsreader app and using Parse.com background jobs to collect links from RSS feed of the newspaper. I have used xmlreader.js and sax.js to parse the httpResponse and using saveAll and beforeSave, periodically update the classes in data browser.

I have multiple newspapers with multiple categories making a total of more than 30 pairs, (I would have to later include more pair as I would like to include regional newspapers). Till now I was working with one newspaper and one category - The Hindu, sports category; and it is now working fine. Making copies of these two function and create jobs wont be efficient I think.

Therefore, I wanted to ask if I can convert both these jobs and beforeSave into some kind of function so that I can just pass in either newspaper-category pair class name or its url to do the stuff automatically.

Full Code - main.js

job -

Parse.Cloud.job("job_hindu_sports", function (request, response) {
return Parse.Cloud.httpRequest({
    url: 'http://www.thehindu.com/sport/?service=rss'
}).then(function(httpResponse) {
    var someXml = httpResponse.text;
    xmlreader.read(someXml, function (err, res){
        if(err) {
            response.error("Error " +err);
            return console.log(err);
        }   

        var listArray = [];
        res.rss.channel.item.each(function (i, item){
            var hinduSports = new HinduSports(); //@startswithaj - this part
            hinduSports.set("link", item.link.text());
            hinduSports.set("title", item.title.text());
            hinduSports.set("pubDate", item.pubDate.text());
            //console.log("pubDate - "+ item.pubDate.text());
            listArray.push(hinduSports);
        });

        var promises = [];
        Parse.Object.saveAll(listArray, {
                success: function(objs) {
                    promises.push(objs);
                    console.log("SAVED ALL!");
                },
                error: function(error) { 
                    console.log("ERROR WHILE SAVING - "+error);
                }   
            });
        return Parse.Promise.when(promises);        

    });
}).then(function() {
        response.success("Saving completed successfully.");
        },function(error) {
        response.error("Uh oh, something went wrong.");
});
});

beforeSave -

Parse.Cloud.beforeSave("HinduSports", function(request, response) {
//console.log("in beforeSave");
var query = new Parse.Query(HinduSports);
var linkText = request.object.get("link")
var titleText = request.object.get("title");
query.equalTo("link", linkText);
query.first({
  success: function(object) {
    //console.log("in query");
    if (object) {
        //console.log("found");
        if(object.get('title')!==titleText){
            console.log("title not same");
            object.set("title", titleText);
            response.success();
        }
        else{
            console.log("title same");
            response.error();
        }
    } else {
        console.log("not found");
        response.success();
    }
  },
  error: function(error) {
    response.error();
  }
});
});
Sahil Dave
  • 433
  • 1
  • 7
  • 15

2 Answers2

1

In your job code you could query your datastore for all of the URLS you want to process, and then iterate through the results requesting each url and passing the httpresponse to a function that does all the work

So you would have (pseudo code)

function getDataForNewspaper(id, url){
    return (function(id) {
        Parse.Cloud.httpRequest({
            url: url
        }).then(function(httpResponse){
           processDataForNewspaper(id, httpResponse)
        })
    })(id) //you need to have this in a closure so you can pass id to processDataFor...

function processDataforNewpaper(id, httpResponse){
  someXml = httpResponse.text
  //process your xml here 
}

Parse.Cloud.job("get_data_for_all_newspapers", function (request, response) {
    var query = new Parse.Query("Get all the newspapers").find{
      success: function(list){ 
         for each newspaper in list then 
              getDataForNewspaper(newspaper.id, newspaper.url)  
      }
    }
}

It's not the best explanation but I hope this helps

startswithaj
  • 344
  • 1
  • 9
  • Thanks for replying. I have few doubts.___ 1. What do you mean by the `(id) //you need to have this in a...` line? Is it a string which equals to name of subclass of the newspaper I am dealing with?___2. DUring the process of the XML I need an instance of the subclass to add data to. isnt it?____3.After I process the XML how would I work with beforeSave? should I pass the id as the first argument? – Sahil Dave Apr 26 '14 at 13:16
  • You might be best posting your question in the parse.com forums. They will have other more experience parse users to help you there. – startswithaj Apr 26 '14 at 13:19
  • 1
    I have had a bad experience there, with no one replying. – Sahil Dave Apr 26 '14 at 13:20
  • (function(id){})(id) is a closure it keeps your id variable in scope so that when the promise executes it has access to the id variable. http://foldingair.blogspot.com.au/2013/10/jquery-promises-wrapped-in-javascript.html – startswithaj Apr 26 '14 at 13:23
  • I maybe able to do this if you just help me with one thing. I have edited the `job` code above. Can you please tell, at Line 17, how would I be able to make an instance according to a URL? `var hinduSports = new HinduSports();` – Sahil Dave Apr 26 '14 at 13:25
  • 1
    I think you need to have a class called Newspaper that contains {id, newspaperName, newsPaperUrl}, Then your HinduSports class needs to be changed to something like newsArticle {id, newspaperId, link, title, pubdate} then each newsArticle can be linked to a newsPaper by its Id. – startswithaj Apr 26 '14 at 13:37
  • Ok, this looks fine, so I would just need only one instance of newsArticle class while prcoessing xml and adding objects to the class, right? A problem may arise because 1 feed have ~30 entries and this would make the table very large and beforeSave would be slow. – Sahil Dave Apr 26 '14 at 13:46
  • 1
    Yeah you can use NewsArticle class for any newspaper article. I would say if you had thousands and thousands of articles it could make it slow. But you are simply executing a single column equalsTo (query.equalTo("link", linkText);) query in your beforeSave method. These should be very fast on mongodb (what parse uses) even if the field your querying against isn't indexed. Don't worry about that. – startswithaj Apr 26 '14 at 14:01
  • I have modified my code and added it as answer. Please take a look. The only issue I am having is that `saveAll()` doesnt get completed some times. – Sahil Dave Apr 26 '14 at 18:26
0

With the help from @startswithaj I modified my code to save all the articles in one class. The only thing left is to add a beforeSave method. But there is still a problem. saveAll gets completed only sometimes. For eg. I ran the code first time and got this in log :

I2014-04-26T18:18:40.036Z] v93: Ran job job_get_data_for_all_newspapers with:
Input: {}
Result: Saving completed successfully.
I2014-04-26T18:18:40.926Z] Successfully retrieved 2
I2014-04-26T18:18:40.926Z] getData NEW & CAT ID - 1, 5 feedUrl http://www.thehindu.com/sport/?service=rss
I2014-04-26T18:18:40.927Z] getData NEW & CAT ID - 1, 4 feedUrl http://www.thehindu.com/news/national/?service=rss
I2014-04-26T18:18:40.927Z] promisesGetNP [object Object],[object Object]
I2014-04-26T18:18:41.479Z] processData NEW & CAT ID - 1, 5
I2014-04-26T18:18:41.622Z] listArray http://www.thehindu.com/sport/other-sports/mankirat-singh-sets-record/article5951540.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
I2014-04-26T18:18:41.628Z] promises undefined
I2014-04-26T18:18:41.629Z] promisesGetData 
I2014-04-26T18:18:41.629Z] Done getData? 
I2014-04-26T18:18:42.082Z] processData NEW & CAT ID - 1, 4
I2014-04-26T18:18:42.311Z] listArray http://www.thehindu.com/news/national/muslim-women-entitled-to-maintenance-even-after-divorce-supreme-court/article5951562.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
I2014-04-26T18:18:42.324Z] promises undefined
I2014-04-26T18:18:42.324Z] promisesGetData 
I2014-04-26T18:18:42.324Z] Done getData? 
I2014-04-26T18:18:42.324Z] done job

and second time after deleting a few useless console.log I got this. You can see there is a SAVED ALL! which is called in the success: function of the saveAll -

I2014-04-26T18:20:53.130Z] v94: Ran job job_get_data_for_all_newspapers with:
Input: {}
Result: Saving completed successfully.
I2014-04-26T18:20:53.307Z] Successfully retrieved 2
I2014-04-26T18:20:53.307Z] getData NEW & CAT ID - 1, 5 feedUrl http://www.thehindu.com/sport/?service=rss
I2014-04-26T18:20:53.307Z] getData NEW & CAT ID - 1, 4 feedUrl http://www.thehindu.com/news/national/?service=rss
I2014-04-26T18:20:53.911Z] processData NEW & CAT ID - 1, 5
I2014-04-26T18:20:53.951Z] listArray http://www.thehindu.com/sport/other-sports/mankirat-singh-sets-record/article5951540.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
I2014-04-26T18:20:53.995Z] Done getData? 
I2014-04-26T18:20:54.200Z] SAVED ALL!
I2014-04-26T18:20:54.818Z] processData NEW & CAT ID - 1, 4
I2014-04-26T18:20:55.016Z] listArray http://www.thehindu.com/news/national/muslim-women-entitled-to-maintenance-even-after-divorce-supreme-court/article5951562.ece?utm_source=RSS_Feed&utm_medium=RSS&utm_campaign=RSS_Syndication
I2014-04-26T18:20:55.031Z] Done getData? 
I2014-04-26T18:20:55.031Z] done job

My new code can be found here. The new code starts at Line 150.

Sahil Dave
  • 433
  • 1
  • 7
  • 15