0

I'm building a chrome extension that injects a content script into YouTube pages to scrape its entire HTML. I don't need it parsed within the extension, but what's most important in the scraped HTML is the recommendation list. I understand that YouTube is a Single Page Application so it loads dynamically, and I've tried getting the HTML listening to all kinds of events (transitionend, yt-navigate-finish (problem: this event fires 3 times), yt-page-data-updated) but the recommendation list is never included in the scraped HTML. For example, this code from the solution mentioned above doesn't work:

my content.js

(document.body || document.documentElement).addEventListener('transitionend', function(event){
    if (event.target.id === 'progress' && event.propertyName === 'width') { 
// it's also not always the case that a transitionend event on 'progress' with propertyName 'width' exists
        var x = document.body.outerHTML;
        chrome.runtime.sendMessage({
            url: location.href,
            source: x,
            // source:document.getElementsByTagName("ytd-compact-video-renderer"),
            message: 'readHTML'
        });
    }
});

I'm wondering which event I should listen for in order to get the recommendation list. I've tried monitorEvents() on the elements that host the list ('secondary','secondary-inner','items', etc.) but I couldn't find any events which signal that the list has been loaded fully. Any help is appreciated. Thanks!

  • Use MutationObserver. P.S. Regarding the line with getElementsByTagName, note that messaging can't send DOM elements. – wOxxOm Sep 29 '20 at 03:37
  • @wOxxOm Thanks for pointing that out about tag name! I've looked into MutationObserver, but I'm not sure how it can observe when the recommendation list has finished loading – Charlotte Ji Oct 08 '20 at 01:39
  • @wOxxOm Also, my plugin is able to detect changes in URL and inject the content script using tabs.onUpdated in background.js. I've switched to using setTimeout in the content script to wait 5 seconds so that in most cases the webpage has finished loading, but it's probably not the best way to do it – Charlotte Ji Oct 08 '20 at 01:51
  • Use a short *sliding timeout* (also called *debounce*) like 100ms in MutationObserver so when the changes stop occurring for more than 100ms it would mean the end. – wOxxOm Oct 08 '20 at 04:40

0 Answers0