2

Problem: I am working on an extension in javascript which needs to be able to view the source HTML of a page after everything is rendered. The problem is that no matter what method I use, I can only seem to retrieve the pre-rendered source. The website is using emberjs for generating the content of the page.

Example: Site: https://www.playstation.com/en-us/explore/games/ps4-games/?console=ps4

When I right click and view source, I get the page before the content is loaded.When I right click and inspect element, I want to get the source after the content has loaded.

What I've tried:

background.js

var acceptedURLPattern = "playstation.com";

tabUpdatedCallback = function(tabID, changeInfo, tab) {
    if(tab.url.indexOf(acceptedURLPattern) == -1)   return;

    var eventJsonScript = {
        code: "console.log(\"Script Injected\"); window.addEventListener(\"load\", (event) => { " + browserString + ".runtime.sendMessage({ \"html\": document.documentElement.outerHTML });});"
    };


    browser.tabs.executeScript(tabID, eventJsonScript);
}

handleHTMLMessage = function(request, sender, sendResponse) {
    console.log(request);
}

browser.tabs.onUpdated.addListener(tabUpdatedCallback);
browser.runtime.onMessage.addListener(handleHTMLMessage);

The above script is injecting an eventListener onto the page I want to grab the source of after it fires the "load" event which will then send a message back to background.js containing that source.

I've tried changing the documentElement to innerHTML/outerHTML as well as changing the eventListener to document.addEventListener(\"DOMContentLoaded\"), but none of these changes seemed to have any effect.

I've also tried using these: Get javascript rendered html source using phantomjs and get a browser rendered html+javascript but they are using phantomjs to load and execute the page, then return the html. In my solution, I need to be able to grab the already rendered page.

Thanks for the help in advance!

Edit #1: I took a look at MutationObserver as mentioned by @wOxxOm and changed the eventJsonScript variable to look like this:

var eventJsonScript = {
    code: "console.log(\"Script Injected\"); var mutationObserver = new MutationObserver( (mutations) => { mutations.forEach((mutation) => {if( JSON.stringify(mutation).indexOf(\"Yakuza\") != -1) { console.log(mutation); } });}); mutationObserver.observe(document.documentElement, {attributes: true, characterData: true, childList: true, subtree: true, attributeOldValue: true, characterDataOldValue: true}); mutationObserver.takeRecords()"
};

however despite the site clearly having a section for Yakuza 6, the event doesn't get fired. I did remove the if condition in the injected script to verify that events do get fired normally, it just doesn't seem to contain information that I'm looking for.

  • 1
    Modern web sites usually render themselves **after** DOMContentLoaded was fired so you need to start your code after a delay which you can hardcode via `setTimeout` or try to detect the changes in the page using MutationObserver in your injected code. – wOxxOm Jun 26 '18 at 11:54
  • Thanks for the suggestion @wOxxOm. setTimeout doesn't really work for me because the site load time can vary between systems. I looked into MutationObserver as you mentioned which seems like it could give me information that I'm looking for, but events aren't firing as I am expecting. I added an edit to my question describing the changes I made, but if I need to make a separate question for information on MutationObserver, I can. – Alex Marich Jun 26 '18 at 15:36

1 Answers1

1

So the good news is that someone has already written the code to do this in Ember, you can find it here:

https://github.com/emberjs/ember-test-helpers/blob/031969d016fb0201fd8504ac275526f3a0ab2ecd/addon-test-support/%40ember/test-helpers/settled.js

This is the code Ember tests use to wait until everything is rendered and complete, or "settled".

The bad news is it is a nontrivial task to extract it correctly for your extension.

Basically, you will want to:

  1. Wait till the page is loaded (window.load event)
  2. setTimeout at least 200 ms to ensure the Ember app has booted.
  3. Wait until settled, using code linked above.
  4. Wait until browser is idle (requestIdleCallback in latest Chrome, or get a polyfill).

Hope this helps get you started.

Gaurav
  • 12,662
  • 2
  • 36
  • 34