Given an url, how could a script find what resources are loaded?

Question

Given an url (e.g. localhost:8000), how can a script find what resources will a browser load (via HTTP requests)?

For example, let's suppose that the / route responds with:

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <title>Test</title>
        <link rel="stylesheet" href="/css/style.css">
    </head>
    <body>
        some content
        <img src="/foo.png" alt="">
    </body>
</html>

The resources which will be loaded are:

/css/style.css
foo.png

This is simple (just a dom iteration with via cheerio or so), but it's not so native I think it should.

An iteration in the response HTML will not work for the additional CSS @imports and background-image and so on.

What is the native way to get the list with the CSS, images and maybe other resources which are loaded by the browser?

Maybe is it possible via jsdom?

http://stackoverflow.com/questions/19786525/how-to-list-loaded-resources-with-selenium-phantomjs — adeneo, Mar 08 '15 at 18:29
@adeneo `webpage` seems not to be available... However it's an interesting answer. Still need more information.. — Ionică Bizău, Mar 08 '15 at 18:33
What I'm trying to say is that you probably need a headless browser that actually loads the page to see exactly what resources are loaded if you're going to include things like @imports and XMLHttpRequest etc. — adeneo, Mar 08 '15 at 18:38
@adeneo That would be even better and probably the *native* way. Could you post an answer how to do that? — Ionică Bizău, Mar 08 '15 at 18:40
@pjs Well, I never liked to answer to questions like yours. :-) Maybe I will reply you later where I needed it. — Ionică Bizău, Mar 08 '15 at 19:13

score 1 · Accepted Answer · edited May 23 '17 at 11:43

Like @adeneo suggested, the missing keywords were headless browser. I find it very simple via the zombie library. Below you can see a small example, however the documentation is a great resource.

// Dependencies
var Browser = require("zombie");

// Load localhost:9000
Browser.localhost("localhost", 9000);

// Load the page from localhost,
// including js, css, images and iframes
var browser = new Browser({
  features: "scripts css img iframe"
});

// Open the page and list the resources
browser.visit("/", function(error) {
    console.log(browser.resources.map(function (c) {
        return c.request.url;
    }));
});

Given an url, how could a script find what resources are loaded?

1 Answers1