How to identify the source of js/css file inclusion in webite

Question

I wonder if there is a way to track the exact path of some specific .js or .css file inclusion into a website, especially from third party scripts, like i.e. advertisers.

Let's assume that we have website X. On this site, a script A.js is included. However, this file loads A1.js and A2.js. At the same time, this site loads B.js as a third party file which includes B1.css.

Here comes the question: How to track the path of included files, i.e. like X -> A.js -> (A1.js, A2.js).

Notes:

In an asynchronous world, it is no longer possible to track outcomming requests and sort them in order.
You cannot look at Referer HTTP reader since it is always pointing to X.
The best would be to track it in Chrome Dev tools F12.

It's very early in the morning so bare with me. But aren't all 3rd party scripts that a re being loaded also being appended to tho the body as a script tag? Maybe you could listen that? — noa-dev, Sep 24 '18 at 06:32
That's true. However I think it is impossible to track which script appended the tag considering that it could be after some time according to different latencies between resources being loaded asynchronously. — Athlan, Sep 24 '18 at 08:00
This might help you to listen for DOM Changes, that way you would have a dynamic way to catch all resources. Ofc you will never know if that were all, even after 10minutes. But at least you would have all that are being loaded https://stackoverflow.com/questions/3219758/detect-changes-in-the-dom — noa-dev, Sep 24 '18 at 09:58
If there is no simple solution to track the script chain, like which script included the another, it's not solving my problem :<. — Athlan, Sep 24 '18 at 16:10

score 1 · Accepted Answer · answered Sep 26 '18 at 16:35

You can already track the path of every included file in Chrome DevTools.

For experiment, I set up a webserver on 127.0.0.42 (accessible via 127.0.0.42) and on 127.0.0.43 (accessible from third.party.domain.tld), so both of these servers are isolated.

The website X runs on 127.0.0.42 and has an overly simplistic webpage with this code:

<head>
    <script src="js/A.js"></script>
    <script src="http://third.party.domain.tld/js/B.js"></script>
</head>

It includes both a local A.js file and a B.js file from a third-party.

The A.js file has code of the same complexity level as our X page:

console.log("Hello from A.js!");

function include_script(source) {
    var script = document.createElement("script");
    script.src = source;

    document.head.appendChild(script);
}

include_script("js/A1.js");
include_script("js/A2.js");

Note that the 7th line here is the line where the script file gets appended.

Both A1.js and A2.js have one line where they log two different messages to console.

The B.js file hosted on the third party server contains this code:

console.log("Hello from B.js!");

function include_style(source) {
    var link = document.createElement("link");
    link.rel = "stylesheet";
    link.href = source;

    document.head.appendChild(link);
}

include_style("http://third.party.domain.tld/css/B.css");

This loads a stylesheet from the third party website.

Now open DevTools and use the Network tab:

Requests highlighted in red show A1.js and A2.js load initiated by the 7th line of A.js.

The request in blue shows B.css load initiated by the 8th line of B.js.

Green requests show the inclusion of both A.js and B.js from the (index) -- means the index page requested these.

Hover over names of each resource to reveal their original location.

Click on initiator link (the A.js:7 or (index)) to show the exact line where the resource load was triggered.

Thank you a lot! Initiator was that I was looking for. Via devtools I can even automate it. Thank you so much! — Athlan, Sep 26 '18 at 16:56
BTW Nice engagement, you even prepared files matching usecase in the question. — Athlan, Sep 26 '18 at 16:58

score 0 · Answer 2 · answered Sep 24 '18 at 06:24

There is a simple webscraper that is open-source, it is called HTTrack. It will download all assets in folders named after their domain name. This is a relatively easy way to understand which assets are being used and from which domain.

https://www.httrack.com/

score 0 · Answer 3 · edited Jun 20 '20 at 09:12

https://www.charlesproxy.com/

Charles is a web proxy (HTTP Proxy / HTTP Monitor) that runs on your own computer. Your web browser (or any other Internet application) is then configured to access the Internet through Charles, and Charles is then able to record and display for you all of the data that is sent and received.

In Web and Internet development you are unable to see what is being sent and received between your web browser / client and the server. Without this visibility it is difficult and time-consuming to determine exactly where the fault is. Charles makes it easy to see what is happening, so you can quickly diagnose and fix problems.

score -2 · Answer 4 · answered Sep 21 '18 at 07:35

-2

You could do it using some kind of bot, or just do it manually.

In all Browsers, you are able to see the HTML source code, so you could get the paths from there, view that source code, get the paths, etc.

Source: Source Code in Browser

Hope that is what you're looking for.

answered Sep 21 '18 at 07:35

Rapwnzel

272
2
9

I am looking for some automated process/tool having megabytes of minified third-party sources. – Athlan Sep 21 '18 at 07:36
Then you would need to write some kind of web scraper that is reading the HTML header data and follows the path I guess. – Rapwnzel Sep 21 '18 at 07:39

How to identify the source of js/css file inclusion in webite

4 Answers4