1

I have made an extension that will track what manga a person reads on a manga site and list what chapter they last read for it in their favorites page. And I've recently come up with a useful feature to make the extension a little bit better. I would like to give the user the option to be able to track only manga that they have Favorited on the site. So as they are reading, the extension will constantly check in the background if it is in their favorites and if so then save it and if not don't save it.

The website has a favorites page that holds a list of all of the manga a person has Favorited. I would like to be able to constantly grab the names of each manga listed on that page in the background hidden from the user.

So my question is, is there any way to grab the html of a specific page in the background and constantly grab specific data such as text of certain elements to save to an array, without the user having to actually be on the favorites page?

Edit: Solution

var barray = [];
function getbm(callback) {
    var xhr = new XMLHttpRequest();
    xhr.onreadystatechange = function(data) {
        if (xhr.readyState == 4) {
            if (xhr.status == 200) {
                var data = xhr.responseText;
                callback(data);
            } else {
                callback(null);
            }
        }
    }
    var url = 'http://mangafox.me/bookmark/index.php?status=all';
    xhr.open('GET', url, true);
    xhr.send();
};
function res(data) {
    var parsed  = $.parseHTML(data);
    parsed = $('<div />').append(parsed);
    parsed.find('h2.title').each(function(){
        var bmanga = $(this).children('a.title').text();
        barray.push({"manga": bmanga});
    });
    chrome.storage.local.set({'bData': barray})
};
getbm(res);
Norman V
  • 85
  • 1
  • 6

1 Answers1

0

It heavily depends on how the page in question is constructed.

If the page is static (HTTP response includes the data you need), then scraping the page via XMLHttpRequest is the way to go.

If the page is dynamic (no data initially, and JavaScript on the page then queries the server to fill it), then XHR route will not work. You can try to observe network requests made by that page and replicate them.

Of note: while it's unlikely, check if the site has a public API. That will save you the reverse-engineering efforts and lets you avoid the grey area of automated data scraping.


Also, see if you can somehow check from the page you're normally tracking if the item is favourited or not. It will be easier than scraping another page.

Xan
  • 74,770
  • 16
  • 179
  • 206
  • Thank You, it worked! However it seems that upon logging the data i received it logs the entire html code as one giant string. How would i make it searchable? For example searching it using the following code `var manga = $('h2.title').children('a.title').text();` – Norman V Dec 04 '14 at 14:13
  • http://stackoverflow.com/questions/20196442/parse-xmlhttprequest-responsetext-with-jquery – Xan Dec 04 '14 at 14:22