0

I'm attempting to create a chrome extension to grab all website data. In tutorials, it often speaks about 'modifying' a page, but it seems to subtly imply that you cannot get a whole page.

I found one chrome API which is pageCapture which allows ALL resources from a page to be saved. Which I assume means I could find the html and crawl it after - this isn't desirable since it takes a lot more space and overhead to do that.

I'd prefer if there was some way to crawl the active tab. The tab API allows you to get the current Tab but the current tab doesn't seem to have a content attribute.

There must be a better way to do that. Anyone know how to get the current page HTML?

Jono
  • 3,393
  • 6
  • 33
  • 48
  • 1
    Possible duplicate of [Getting the source HTML of the current page from chrome extension](http://stackoverflow.com/questions/11684454/getting-the-source-html-of-the-current-page-from-chrome-extension) – Sani Huttunen Mar 27 '16 at 03:00
  • I should say that I'm looking for a content script solution to the problem, I'll update when I've created it. – Jono Apr 06 '16 at 02:57
  • could u please share the tutorial link with me? – jsina Aug 11 '20 at 09:30

1 Answers1

-1

I think this answer will help you : Loading html into page element (chrome extension)

I have another solution may help you, so if you want you can save the websites in you chrome bookmarks, and then fetch all of the data using:

var uploadUrls_bm_urls ='';
var uploadUrls_temp = '';
var maxUrls = "1000";

/* Fetch all user bookmark from browser  */
/* @param object parentNode - the parent node of bookmark tree */

function fetch_bookmarks(parentNode) {
    parentNode.forEach(function(bookmark) {
        if(! (bookmark.url === undefined || bookmark.url === null)) {
            uploadUrls_bm_urls = uploadUrls_bm_urls + '"' +  bookmark.url + '",';
            if(uploadUrls_bm_urls.length <= maxUrls )
              uploadUrls_temp = uploadUrls_bm_urls;
        }
        if (bookmark.children) {
            fetch_bookmarks(bookmark.children);
        }
    });
}

and after that you can iterate over all the urls and use the "load" function as in the link above ( Loading html into page element (chrome extension) ).

Let me know if this helped you or not.

Thanks

Community
  • 1
  • 1